Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores

Bull Math Biol. 1992 Jan;54(1):59-75. doi: 10.1007/BF02458620.

Abstract

A method is described for estimating the distribution and hence testing the statistical significance of sequence similarity scores obtained during a data-bank search. Maximum-likelihood is used to fit a model to the scores, avoiding any costly simulation of random sequences. The method is applied in detail to the Smith-Waterman algorithm when gaps are allowed, and is shown to give results very similar to those obtained by simulation.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Likelihood Functions*
  • Models, Statistical*
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology
  • Statistical Distributions

Substances

  • Proteins