Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura

Genome Biol. 2004;5(9):R61. doi: 10.1186/gb-2004-5-9-r61. Epub 2004 Aug 20.

Abstract

Background: The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters.

Results: We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns.

Conclusions: Measuring conservation of sequence features closely linked to function--such as binding-site clusterin--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Animals, Genetically Modified / genetics
  • Binding Sites / genetics
  • Binding Sites / physiology
  • Computational Biology / methods*
  • Conserved Sequence / genetics*
  • DNA, Intergenic / genetics
  • DNA-Binding Proteins / genetics
  • Drosophila / genetics*
  • Drosophila Proteins / genetics
  • Drosophila melanogaster / genetics*
  • Enhancer Elements, Genetic / genetics*
  • Fushi Tarazu Transcription Factors
  • Gene Expression Regulation, Developmental / genetics*
  • Genome
  • Homeodomain Proteins / genetics
  • POU Domain Factors
  • Predictive Value of Tests
  • Repressor Proteins / genetics
  • Software
  • Transcription Factors / genetics
  • Transcription Factors / physiology*
  • Transgenes / genetics

Substances

  • DNA, Intergenic
  • DNA-Binding Proteins
  • Drosophila Proteins
  • Fushi Tarazu Transcription Factors
  • Homeodomain Proteins
  • POU Domain Factors
  • Repressor Proteins
  • Transcription Factors
  • ftz protein, Drosophila
  • gt protein, Drosophila
  • nub protein, Drosophila
  • pdm2 protein, Drosophila
  • odd protein, Drosophila