FORESST: fold recognition from secondary structure predictions of proteins

Bioinformatics. 1999 Feb;15(2):131-40. doi: 10.1093/bioinformatics/15.2.131.

Abstract

Motivation: A method for recognizing the three-dimensional fold from the protein amino acid sequence based on a combination of hidden Markov models (HMMs) and secondary structure prediction was recently developed for proteins in the Mainly-Alpha structural class. Here, this methodology is extended to Mainly-Beta and Alpha-Beta class proteins. Compared to other fold recognition methods based on HMMs, this approach is novel in that only secondary structure information is used. Each HMM is trained from known secondary structure sequences of proteins having a similar fold. Secondary structure prediction is performed for the amino acid sequence of a query protein. The predicted fold of a query protein is the fold described by the model fitting the predicted sequence the best.

Results: After model cross-validation, the success rate on 44 test proteins covering the three structural classes was found to be 59%. On seven fold predictions performed prior to the publication of experimental structure, the success rate was 71%. In conclusion, this approach manages to capture important information about the fold of a protein embedded in the length and arrangement of the predicted helices, strands and coils along the polypeptide chain. When a more extensive library of HMMs representing the universe of known structural families is available (work in progress), the program will allow rapid screening of genomic databases and sequence annotation when fold similarity is not detectable from the amino acid sequence.

Availability: FORESST web server at http://absalpha.dcrt.nih.gov:8008/ for the library of HMMs of structural families used in this paper. FORESST web server at http://www.tigr.org/ for a more extensive library of HMMs (work in progress).

Contact: valedf@tigr.org; munson@helix.nih.gov; garnier@helix.nih.gov

MeSH terms

  • Computer Simulation
  • Databases, Factual
  • Markov Chains
  • Protein Folding*
  • Protein Structure, Secondary*
  • Proteins / chemistry
  • Proteins / classification
  • Reproducibility of Results
  • Software*
  • Stochastic Processes

Substances

  • Proteins