Prediction of gene expression specificity by promoter sequence patterns

DNA Res. 1997 Apr 28;4(2):81-90. doi: 10.1093/dnares/4.2.81.

Abstract

We present here a heuristic method toward predicting the expression specificity in the transcriptional process, which is known to be regulated in large part by promoter sequences, by observing the appearance of conserved sequence patterns in a group of known promoters, such as for housekeeping or tissue-specific genes. Statistically conserved patterns were automatically extracted from a set of unaligned sequences up to 200 bp upstream of the transcription initiation site, by a standard procedure using the Markov chain and binomial distribution models. Furthermore, to obtain signal sequences of optimal lengths we devised a method that combines the multiple alignment and the analysis of the information content (or relative entropy). Groups of related promoters were compiled from the EPD eukaryotic promoter database and the EMBL nucleic acid sequence database. Each promoter was examined for its specificity by linear discriminant analysis to test the validity of the extracted patterns. Our method could correctly discriminate 77.6% of the housekeeping gene promoters and 62.9% of the liver promoters.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Composition
  • Binomial Distribution
  • Discriminant Analysis
  • Gene Expression / genetics*
  • Genes, Reporter
  • Humans
  • Liver
  • Markov Chains
  • Models, Genetic*
  • Molecular Sequence Data
  • Promoter Regions, Genetic*

Associated data

  • GENBANK/J00098
  • GENBANK/K02048
  • GENBANK/K02212
  • GENBANK/M10065
  • GENBANK/M10949
  • GENBANK/M11228
  • GENBANK/M11518
  • GENBANK/M12792
  • GENBANK/M13075
  • GENBANK/M15082
  • GENBANK/M15657
  • GENBANK/M19808
  • GENBANK/M34058
  • GENBANK/M35425
  • GENBANK/M57450
  • GENBANK/X01793
  • GENBANK/X02415
  • GENBANK/X02775
  • GENBANK/X03258
  • GENBANK/X04898
  • GENBANK/X04981
  • GENBANK/X05018
  • GENBANK/X05151
  • GENBANK/X05331
  • GENBANK/X05779
  • GENBANK/X06482
  • GENBANK/X12662
  • GENBANK/X15323
  • GENBANK/X16789
  • GENBANK/X53038