An automated proteogenomic method uses mass spectrometry to reveal novel genes in Zea mays

Mol Cell Proteomics. 2014 Jan;13(1):157-67. doi: 10.1074/mcp.M113.031260. Epub 2013 Oct 18.

Abstract

New technologies in genomics and proteomics have influenced the emergence of proteogenomics, a field at the confluence of genomics, transcriptomics, and proteomics. First generation proteogenomic toolkits employ peptide mass spectrometry to identify novel protein coding regions. We extend first generation proteogenomic tools to achieve greater accuracy and enable the analysis of large, complex genomes. We apply our pipeline to Zea mays, which has a genome comparable in size to human. Our pipeline begins with the comparison of mass spectra to a putative translation of the genome. We select novel peptides, those that match a region of the genome that was not previously known to be protein coding, for grouping into refinement events. We present a novel, probabilistic framework for evaluating the accuracy of each event. Our calculated event probability, or eventProb, considers the number of supporting peptides and spectra, and the quality of each supporting peptide-spectrum match. Our pipeline predicts 165 novel protein-coding genes and proposes updated models for 741 additional genes.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Genome, Plant
  • Genomics*
  • Humans
  • Mass Spectrometry
  • Open Reading Frames
  • Proteomics*
  • Zea mays / genetics*