Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores

R Mott

doi:10.1007/BF02458620

Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores

Bull Math Biol. 1992 Jan;54(1):59-75. doi: 10.1007/BF02458620.

Author

R Mott¹

Affiliation

¹ Laboratory of Mathematical Biology, National Institute for Medical Research, Mill Hill, NW7 1AA, London, U.K., r-mott@uk.ac.mrc.nimr.

PMID: 25665661
DOI: 10.1007/BF02458620

Abstract

A method is described for estimating the distribution and hence testing the statistical significance of sequence similarity scores obtained during a data-bank search. Maximum-likelihood is used to fit a model to the scores, avoiding any costly simulation of random sequences. The method is applied in detail to the Smith-Waterman algorithm when gaps are allowed, and is shown to give results very similar to those obtained by simulation.

MeSH terms

Algorithms*
Amino Acid Sequence
Computer Simulation
Data Interpretation, Statistical*
Likelihood Functions*
Models, Statistical*
Molecular Sequence Data
Proteins / chemistry*
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*
Sequence Homology
Statistical Distributions

Substances

Proteins