Computational DNA sequence analysis

S Karlin; L R Cardon

doi:10.1146/annurev.mi.48.100194.003155

Computational DNA sequence analysis

Annu Rev Microbiol. 1994:48:619-54. doi: 10.1146/annurev.mi.48.100194.003155.

Authors

S Karlin¹, L R Cardon

Affiliation

¹ Department of Mathematics, Stanford University, California 94305.

PMID: 7826021
DOI: 10.1146/annurev.mi.48.100194.003155

Abstract

This paper reviews several new developments in computer and statistical analysis of DNA and protein sequences. We present criteria and describe means for assessing and interpreting genomic inhomogeneities within and between sequences. These include: (a) characterizations of short oligonucleotide biases and general compositional tendencies; (b) molecular evolutionary reconstructions based on dinucleotide relative abundance distance measures and partial orderings; and (c) the application of r-scan statistics, quantile distributions, and score-based analyses to identify clustering, overdispersion, and excessive evenness in the distribution of a marker array along a sequence. These apply, for example, to restriction sites, microsatellite runs, regulatory motifs, and nucleosome placements. Furthermore, (d) the definition and determination of rare and frequent oligonucleotides and peptides provides another perspective on sequence heterogeneity, and (e) score methods are also applied in exon and gene locations. Most of the ideas and methods are illustrated with respect to bacteriophage genomes, to megabase amounts of several eukaryotic sequences, to a diverse collection of bacterial sets, to mitochondrial chromosomes, and to a broad assembly of viral genomes.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.
Review

MeSH terms

Amino Acid Sequence
Animals
Base Sequence
DNA / genetics*
Genetic Variation / genetics
Humans
Models, Statistical
Molecular Sequence Data
Oligonucleotides / genetics
Phylogeny*
Sequence Analysis, DNA*

Substances

Oligonucleotides
DNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding