Sequence complexity and DNA curvature

A Gabrielian; A Bolshoy

doi:10.1016/s0097-8485(99)00007-8

Sequence complexity and DNA curvature

Comput Chem. 1999 Jun 15;23(3-4):263-74. doi: 10.1016/s0097-8485(99)00007-8.

Authors

A Gabrielian¹, A Bolshoy

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

PMID: 10404619
DOI: 10.1016/s0097-8485(99)00007-8

Abstract

A linguistic complexity measure was applied to the complete genomes of HIV-1, Escherichia coli, Bacillus subtilis, Haemophilus influenzae, Mycoplasma genitalium, and to long human and yeast genomic fragments. Complexity values averaged over entire genomic sequences were compared, as were predicted average values of intrinsic DNA curvature. We found that both the most curved and the least complex fragments are located preferentially in non-coding parts of the genome. Analysis of location of the most curved and the simplest regions in bacteria showed that the low-complexity segments are preferentially located in close proximity to the highly curved sequences, which are, in turn, placed from 100 to 200 bases upstream to the start of the nearest coding sequence. We conclude that the parallel analysis of sequence complexity and DNA curvature might provide important information about sequence-structure-function relationship in genomes.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Bacillus subtilis / genetics
DNA / chemistry
DNA / genetics*
Escherichia coli / genetics
Genome*
HIV-1 / genetics
Haemophilus influenzae / genetics
Humans
Mycoplasma / genetics
Nucleic Acid Conformation*
Saccharomyces cerevisiae / genetics

Substances

DNA