Generalized affine gap costs for protein sequence alignment

Proteins. 1998 Jul 1;32(1):88-96.

Abstract

Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Molecular Sequence Data
  • Protein Conformation*
  • Proteins / chemistry*
  • Sequence Alignment*

Substances

  • Proteins