Partially sequenced organisms, decoy searches and false discovery rates

J Proteome Res. 2012 Mar 2;11(3):1991-5. doi: 10.1021/pr201035r. Epub 2012 Feb 16.

Abstract

Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Blood Proteins / chemistry
  • Blood Proteins / metabolism*
  • Computer Simulation
  • Databases, Protein*
  • Humans
  • Models, Biological
  • Peptide Fragments / chemistry
  • Peptide Mapping / methods*
  • Peptide Mapping / standards

Substances

  • Blood Proteins
  • Peptide Fragments