Shape-based feature matching improves protein identification via LC-MS and tandem MS

J Comput Biol. 2011 Apr;18(4):547-57. doi: 10.1089/cmb.2010.0155. Epub 2011 Mar 21.

Abstract

The characterization of proteins via liquid chromatography-mass spectrometry (LC-MS) and tandem MS is a challenge due to the large dynamic range and the high complexity of the molecules of interest. In LC-MS experiments, the inconsistent variation in the travel time of analytes in the LC column results in nonlinear shifts in the LC retention time (RT). This variability must be corrected to accurately match corresponding peptide features across samples in LC-MS experiments. Standard methods for RT alignment applied to the raw data are computationally expensive, making it impractical to process a large number of samples. More successful algorithms perform the alignment on features that matched across experiments based on pre-specified mass and RT windows. Features that match across multiple experiments are more likely to be true positives and, therefore, will be more suitable to drive the alignment correction. However, depending on the feature matching algorithm, ambiguities can arise when more than one candidate feature match falls within the specified windows which might affect the alignment performance. In addition, some of the feature-based alignment algorithms do not correct for nonlinear RT shifts. We propose a novel feature matching algorithm that incorporates wavelet-based shape information about the features. We tested our algorithm on two different applications of MS. First, we combined the feature matching algorithm with a robust nonparametric kernel-type regression to form a nonlinear feature-based alignment framework for LC-MS experiments. We validated our alignment framework on LC-MS data from complex samples with known spiked-in proteins, demonstrating our ability to correctly identify each of them with higher reproducibility and probability score when comparing with the SuperHirn software. In addition, by using our feature-based alignment framework, we were able to increase the number of matched features and improve the correlation between replicates. Second, we tested our feature matching algorithm on MALDI MS with MS/MS acquisitions. We found that using only features that matched across replicates of tandem mass spectra we could improve the identification of peptides compared with the current state-of-the-art software. Supplementary Material is available online at www.libertonline.com/cmb .

Publication types

  • Validation Study

MeSH terms

  • Algorithms*
  • Chromatography, Liquid / methods
  • Humans
  • Peptides / chemistry
  • Proteins / chemistry*
  • Proteomics / methods*
  • Reproducibility of Results
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods
  • Tandem Mass Spectrometry / methods*

Substances

  • Peptides
  • Proteins