cWINNOWER algorithm for finding fuzzy dna motifs

J Bioinform Comput Biol. 2004 Mar;2(1):47-60. doi: 10.1142/s0219720004000466.

Abstract

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4).

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics*
  • Base Sequence
  • Binding Sites / genetics
  • DNA / chemistry*
  • DNA / genetics*
  • Fuzzy Logic
  • Gene Expression Profiling / methods*
  • Molecular Sequence Data
  • Protein Binding / genetics
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Stochastic Processes

Substances

  • DNA