Several appropriate background distributions for entropy-based protein sequence conservation measures

J Theor Biol. 2010 Jan 21;262(2):317-22. doi: 10.1016/j.jtbi.2009.09.030. Epub 2009 Oct 4.

Abstract

Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Catalytic Domain
  • Conserved Sequence*
  • Entropy*
  • Ligands
  • ROC Curve
  • Sequence Analysis, Protein*

Substances

  • Ligands