A protein database constructed from low-coverage genomic sequence of Bacillus megaterium and its use for accelerated proteomic analysis

J Biotechnol. 2006 Jul 25;124(3):486-95. doi: 10.1016/j.jbiotec.2006.01.033. Epub 2006 Mar 29.

Abstract

Peptide mass fingerprint (PMF) matching is a high-throughput method used for protein spot identification in connection with two-dimensional gel electrophoresis (2DE). However, the success of PMF matching largely depends on whether the proteins to be identified exist in the database searched. Consequently, it is often necessary to apply other more sophisticated but also time-consuming technologies to generate sequence-tags for definitive protein identification. On the other hand, modern sequencing technologies are generating a large quantity of DNA sequences, first in unfinished form or with low genome coverage due to the time-consuming and thus limiting steps of finishing and annotation. We recently started to sequence the genome of Bacillus megaterium DSM 319, a bacterium of industrial interest. In this study, we demonstrate that a protein database generated from merely three-fold coverage, unfinished genomic sequences of this bacterium allows a fast and reliable protein spot identification solely based on PMF from high-throughput MALDI-TOF MS analysis. We further show that the strain-specific protein database from low coverage genomic sequence greatly outperforms the commonly used cross-species databases constructed from 13 completely sequenced Bacillus strains for protein spot identification via PMF.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacillus megaterium / genetics*
  • Bacterial Proteins / genetics*
  • Base Sequence
  • Chromosome Mapping / methods*
  • Databases, Protein*
  • Genome, Bacterial / genetics
  • Information Storage and Retrieval / methods
  • Molecular Sequence Data
  • Peptide Mapping / methods*
  • Proteome / genetics*
  • Proteomics / methods
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*

Substances

  • Bacterial Proteins
  • Proteome