Multiple alignment using hidden Markov models

S R Eddy

Multiple alignment using hidden Markov models

Proc Int Conf Intell Syst Mol Biol. 1995:3:114-20.

Author

S R Eddy¹

Affiliation

¹ Dept. of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA.

PMID: 7584426

Abstract

A simulated annealing method is described for training hidden Markov models and producing multiple sequence alignments from initially unaligned protein or DNA sequences. Simulated annealing in turn uses a dynamic programming algorithm for correctly sampling suboptimal multiple alignments according to their probability and a Boltzmann temperature factor. The quality of simulated annealing alignments is evaluated on structural alignments of ten different protein families, and compared to the performance of other HMM training methods and the ClustalW program. Simulated annealing is better able to find near-global optima in the multiple alignment probability landscape than the other tested HMM training methods. Neither ClustalW nor simulated annealing produce consistently better alignments compared to each other. Examination of the specific cases in which ClustalW outperforms simulated annealing, and vice versa, provides insight into the strengths and weaknesses of current hidden Markov model approaches.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Amino Acid Sequence
Base Sequence
Consensus Sequence*
DNA / chemistry*
Epidermal Growth Factor / chemistry
Markov Chains*
Molecular Sequence Data
Odds Ratio
Probability
Proteins / chemistry*
Sequence Homology, Amino Acid*
Sequence Homology, Nucleic Acid*

Substances

Proteins
Epidermal Growth Factor
DNA

Grants and funding

1-F32-GM16932-01/GM/NIGMS NIH HHS/United States