Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming

Nucleic Acids Res. 2005 Jun 7;33(10):3263-70. doi: 10.1093/nar/gki644. Print 2005.

Abstract

Several methods exist for predicting non-coding RNA (ncRNA) genes in Escherichia coli (E.coli). In addition to about sixty known ncRNA genes excluding tRNAs and rRNAs, various methods have predicted more than thousand ncRNA genes, but only 95 of these candidates were confirmed by more than one study. Here, we introduce a new method that uses automatic discovery of sequence patterns to predict ncRNA genes. The method predicts 135 novel candidates. In addition, the method predicts 152 genes that overlap with predictions in the literature. We test sixteen predictions experimentally, and show that twelve of these are actual ncRNA transcripts. Six of the twelve verified candidates were novel predictions. The relatively high confirmation rate indicates that many of the untested novel predictions are also ncRNAs, and we therefore speculate that E.coli contains more ncRNA genes than previously estimated.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Escherichia coli / genetics*
  • Genes, Bacterial*
  • Genes, rRNA
  • RNA, Transfer / genetics
  • RNA, Untranslated / analysis
  • RNA, Untranslated / genetics*
  • Sequence Analysis, DNA / methods*

Substances

  • RNA, Untranslated
  • RNA, Transfer