Predicting protein structural class by incorporating patterns of over-represented k-mers into the general form of Chou's PseAAC

Protein Pept Lett. 2012 Apr;19(4):388-97. doi: 10.2174/092986612799789350.

Abstract

Computational prediction of protein structural class based on sequence data remains a challenging problem in current protein science. In this paper, a new feature extraction approach based on relative polypeptide composition is introduced. This approach could take into account the background distribution of a given k-mer under a Markov model of order k-2, and avoid the curse of dimensionality with the increase of k by using a T-statistic feature selection strategy. The selected features are then fed to a support vector machine to perform the prediction. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides satisfactory performance for structural class prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acids / chemistry*
  • Computational Biology*
  • Databases, Protein
  • Neural Networks, Computer
  • Protein Folding
  • Protein Structure, Tertiary*
  • Proteins / chemistry*
  • Proteins / classification
  • Sequence Analysis, Protein
  • Support Vector Machine

Substances

  • Amino Acids
  • Proteins