info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling

Bioinformatics. 2009 Oct 15;25(20):2715-22. doi: 10.1093/bioinformatics/btp490. Epub 2009 Aug 18.

Abstract

Motivation: Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics rather than during the motif search process itself.

Results: We introduce here info-gibbs, a Gibbs sampling algorithm that efficiently optimizes the IC or the log-likelihood ratio (LLR) of the motif while keeping computation time low. The method compares well with existing methods like MEME, BioProspector, Gibbs or GAME on both synthetic and biological datasets. Our study shows that motif discovery techniques can be enhanced by directly focusing the search on the motif IC or the motif LLR.

Availability: http://rsat.ulb.ac.be/rsat/info-gibbs

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Entropy
  • Genomics
  • Likelihood Functions
  • Oligonucleotide Array Sequence Analysis / methods
  • Transcription Factors / chemistry

Substances

  • Transcription Factors