Generic eukaryotic core promoter prediction using structural features of DNA

Genome Res. 2008 Feb;18(2):310-23. doi: 10.1101/gr.6991408. Epub 2007 Dec 20.

Abstract

Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • DNA / chemistry*
  • Databases, Genetic
  • Genomics / methods*
  • Humans
  • Promoter Regions, Genetic / genetics*

Substances

  • DNA