IRBIS: a systematic search for conserved complementarity

RNA. 2014 Oct;20(10):1519-31. doi: 10.1261/rna.045088.114. Epub 2014 Aug 20.

Abstract

IRBIS is a computational pipeline for detecting conserved complementary regions in unaligned orthologous sequences. Unlike other methods, it follows the "first-fold-then-align" principle in which all possible combinations of complementary k-mers are searched for simultaneous conservation. The novel trimming procedure reduces the size of the search space and improves the performance to the point where large-scale analyses of intra- and intermolecular RNA-RNA interactions become possible. In this article, I provide a rigorous description of the method, benchmarking on simulated and real data, and a set of stringent predictions of intramolecular RNA structure in placental mammals, drosophilids, and nematodes. I discuss two particular cases of long-range RNA structures that are likely to have a causal effect on single- and multiple-exon skipping, one in the mammalian gene Dystonin and the other in the insect gene Ca-α1D. In Dystonin, one of the two complementary boxes contains a binding site of Rbfox protein similar to one recently described in Enah gene. I also report that snoRNAs and long noncoding RNAs (lncRNAs) have a high capacity of base-pairing to introns of protein-coding genes, suggesting possible involvement of these transcripts in splicing regulation. I also find that conserved sequences that occur equally likely on both strands of DNA (e.g., transcription factor binding sites) contribute strongly to the false-discovery rate and, therefore, would confound every such analysis. IRBIS is an open-source software that is available at http://genome.crg.es/~dmitri/irbis/.

Keywords: Ca-α1D; Dystonin; RNA–RNA interaction; alternative splicing; evolutionary conservation; exon skipping; lncRNA; long-range RNA structure; snoRNA.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Caenorhabditis elegans / genetics*
  • Conserved Sequence / genetics*
  • Drosophila melanogaster / genetics*
  • Exons / genetics*
  • Genes / genetics*
  • Humans
  • Introns / genetics*
  • Molecular Sequence Data
  • RNA Splicing / genetics
  • RNA, Small Nucleolar / genetics
  • Sequence Homology, Nucleic Acid
  • Software*

Substances

  • RNA, Small Nucleolar