Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins

J Mol Biol. 1997 Mar 28;267(2):446-63. doi: 10.1006/jmbi.1996.0874.

Abstract

The three-dimensional fold of a protein is described by the organization of its secondary structure elements in 3D space, i.e. its "topology". We find that the protein topology can be recognized from the ID sequence of secondary structure states of the residues alone. Automated recognition is facilitated by use of hidden Markov models (HMMs) to represent topology families of proteins. Such models can be trained on the experimentally observed secondary structure sequences of family members using well established algorithms. Here, we model various topology groups in the alpha class of proteins and identify, from a large database, those proteins having the topology described by each model. The correct topology family for protein secondary structure sequences could be recognized 12 out of 14 times. When the observed secondary structure sequences are replaced with predicted sequences recognition is still achievable 8 out of 14 times. The success rate for observed sequences indicates that our approach will become increasingly useful as the accuracy of secondary prediction algorithms is improved. Our study indicates that the HMMs are useful for protein topology recognition even when no detectable primary amino acid sequence similarity is present. To illustrate the potential utility of our method, protein topology recognition is attempted on leptin, the obese gene product, and the human interleukin-6 sequence, for which fold predictions have been previously published.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computer Simulation
  • Cytochromes / chemistry
  • Cytokines / chemistry
  • Databases, Factual
  • Globins / chemistry
  • Interleukin-6 / chemistry
  • Leptin
  • Markov Chains*
  • Models, Molecular
  • Protein Conformation*
  • Protein Folding
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Proteins / classification
  • Sequence Alignment

Substances

  • Cytochromes
  • Cytokines
  • Interleukin-6
  • Leptin
  • Proteins
  • Globins