Boosting accuracy of automated classification of fluorescence microscope images for location proteomics

Kai Huang; Robert F Murphy

doi:10.1186/1471-2105-5-78

Boosting accuracy of automated classification of fluorescence microscope images for location proteomics

BMC Bioinformatics. 2004 Jun 18:5:78. doi: 10.1186/1471-2105-5-78.

Authors

Kai Huang¹, Robert F Murphy

Affiliation

¹ Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213 USA. khuang@andrew.cmu.edu

Abstract

Background: Detailed knowledge of the subcellular location of each expressed protein is critical to a full understanding of its function. Fluorescence microscopy, in combination with methods for fluorescent tagging, is the most suitable current method for proteome-wide determination of subcellular location. Previous work has shown that neural network classifiers can distinguish all major protein subcellular location patterns in both 2D and 3D fluorescence microscope images. Building on these results, we evaluate here new classifiers and features to improve the recognition of protein subcellular location patterns in both 2D and 3D fluorescence microscope images.

Results: We report here a thorough comparison of the performance on this problem of eight different state-of-the-art classification methods, including neural networks, support vector machines with linear, polynomial, radial basis, and exponential radial basis kernel functions, and ensemble methods such as AdaBoost, Bagging, and Mixtures-of-Experts. Ten-fold cross validation was used to evaluate each classifier with various parameters on different Subcellular Location Feature sets representing both 2D and 3D fluorescence microscope images, including new feature sets incorporating features derived from Gabor and Daubechies wavelet transforms. After optimal parameters were chosen for each of the eight classifiers, optimal majority-voting ensemble classifiers were formed for each feature set. Comparison of results for each image for all eight classifiers permits estimation of the lower bound classification error rate for each subcellular pattern, which we interpret to reflect the fraction of cells whose patterns are distorted by mitosis, cell death or acquisition errors. Overall, we obtained statistically significant improvements in classification accuracy over the best previously published results, with the overall error rate being reduced by one-third to one-half and with the average accuracy for single 2D images being higher than 90% for the first time. In particular, the classification accuracy for the easily confused endomembrane compartments (endoplasmic reticulum, Golgi, endosomes, lysosomes) was improved by 5-15%. We achieved further improvements when classification was conducted on image sets rather than on individual cell images.

Conclusions: The availability of accurate, fast, automated classification systems for protein location patterns in conjunction with high throughput fluorescence microscope imaging techniques enables a new subfield of proteomics, location proteomics. The accuracy and sensitivity of this approach represents an important alternative to low-resolution assignments by curation or sequence-based prediction.

Publication types

Comparative Study
Evaluation Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Cell Line, Tumor
Computational Biology / economics
HeLa Cells / chemistry
HeLa Cells / classification
Humans
Imaging, Three-Dimensional / classification
Intracellular Space / chemistry
Intracellular Space / classification
Microscopy, Fluorescence / classification*
Microscopy, Fluorescence / trends
Proteomics / classification*
Proteomics / trends
Sensitivity and Specificity

Abstract

Publication types

MeSH terms

Grants and funding