Prediction of protein solvent accessibility using support vector machines

Proteins. 2002 Aug 15;48(3):566-70. doi: 10.1002/prot.10176.

Abstract

A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Computational Biology / methods*
  • Computer Simulation
  • Information Theory
  • Linear Models
  • Neural Networks, Computer
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Solvents / chemistry

Substances

  • Proteins
  • Solvents