Sequence complexity and DNA curvature

Comput Chem. 1999 Jun 15;23(3-4):263-74. doi: 10.1016/s0097-8485(99)00007-8.

Abstract

A linguistic complexity measure was applied to the complete genomes of HIV-1, Escherichia coli, Bacillus subtilis, Haemophilus influenzae, Mycoplasma genitalium, and to long human and yeast genomic fragments. Complexity values averaged over entire genomic sequences were compared, as were predicted average values of intrinsic DNA curvature. We found that both the most curved and the least complex fragments are located preferentially in non-coding parts of the genome. Analysis of location of the most curved and the simplest regions in bacteria showed that the low-complexity segments are preferentially located in close proximity to the highly curved sequences, which are, in turn, placed from 100 to 200 bases upstream to the start of the nearest coding sequence. We conclude that the parallel analysis of sequence complexity and DNA curvature might provide important information about sequence-structure-function relationship in genomes.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bacillus subtilis / genetics
  • DNA / chemistry
  • DNA / genetics*
  • Escherichia coli / genetics
  • Genome*
  • HIV-1 / genetics
  • Haemophilus influenzae / genetics
  • Humans
  • Mycoplasma / genetics
  • Nucleic Acid Conformation*
  • Saccharomyces cerevisiae / genetics

Substances

  • DNA