Prediction of protein solvent accessibility using support vector machines

Zheng Yuan; Kevin Burrage; John S Mattick

doi:10.1002/prot.10176

Prediction of protein solvent accessibility using support vector machines

Proteins. 2002 Aug 15;48(3):566-70. doi: 10.1002/prot.10176.

Authors

Zheng Yuan¹, Kevin Burrage, John S Mattick

Affiliation

¹ Institute for Molecular Bioscience and ARC Special Centre for Functional and Applied Genomics, The University of Queensland, Brisbane, Australia. z.yuan@imb.uq.edu.au

PMID: 12112679
DOI: 10.1002/prot.10176

Abstract

A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.

Publication types

Comparative Study
Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
Computational Biology / methods*
Computer Simulation
Information Theory
Linear Models
Neural Networks, Computer
Proteins / chemistry*
Reproducibility of Results
Sequence Alignment
Sequence Analysis, Protein / methods*
Solvents / chemistry

Substances

Proteins
Solvents