Toward a gold standard for promoter prediction evaluation

Bioinformatics. 2009 Jun 15;25(12):i313-20. doi: 10.1093/bioinformatics/btp191.

Abstract

Motivation: Promoter prediction is an important task in genome annotation projects, and during the past years many new promoter prediction programs (PPPs) have emerged. However, many of these programs are compared inadequately to other programs. In most cases, only a small portion of the genome is used to evaluate the program, which is not a realistic setting for whole genome annotation projects. In addition, a common evaluation design to properly compare PPPs is still lacking.

Results: We present a large-scale benchmarking study of 17 state-of-the-art PPPs. A multi-faceted evaluation strategy is proposed that can be used as a gold standard for promoter prediction evaluation, allowing authors of promoter prediction software to compare their method to existing methods in a proper way. This evaluation strategy is subsequently used to compare the chosen promoter predictors, and an in-depth analysis on predictive performance, promoter class specificity, overlap between predictors and positional bias of the predictions is conducted.

Availability: We provide the implementations of the four protocols, as well as the datasets required to perform the benchmarks to the academic community free of charge on request.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Databases, Genetic
  • Genome, Human
  • Humans
  • Promoter Regions, Genetic*
  • Software / standards*