ProphET, prophage estimation tool: A stand-alone prophage sequence prediction tool with self-updating reference database

PLoS One. 2019 Oct 2;14(10):e0223364. doi: 10.1371/journal.pone.0223364. eCollection 2019.

Abstract

Background: Prophages play a significant role in prokaryotic evolution, often altering the function of the cell that they infect via transfer of new genes e.g., virulence or antibiotic resistance factors, inactivation of existing genes or by modifying gene expression. Recently, phage therapy has gathered renewed interest as a promising alternative to control bacterial infections. Cataloging the repertoire of prophages in large collections of species' genomes is an important initial step in understanding their evolution and potential therapeutic utility. However, current widely-used tools for identifying prophages within bacterial genome sequences are mainly web-based, can have long response times, and do not scale to keep pace with the many thousands of genomes currently being sequenced routinely.

Methodology: In this work, we present ProphET, an easy to install prophage predictor to be used in Linux operation system, without the constraints associated with a web-based tool. ProphET predictions rely on similarity searches against a database of prophage genes, taking as input a bacterial genome sequence in FASTA format and its corresponding gene annotation in GFF. ProphET identifies prophages in three steps: similarity search, calculation of the density of prophage genes, and edge refinement. ProphET performance was evaluated and compared with other phage predictors based on a set of 54 bacterial genomes containing 267 manually annotated prophages.

Findings and conclusions: ProphET identifies prophages in bacterial genomes with high precision and offers a fast, highly scalable alternative to widely-used web-based applications for prophage detection.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Databases, Nucleic Acid
  • Genome, Viral
  • Genomics / methods*
  • Molecular Sequence Annotation
  • Prophages / genetics*
  • Sensitivity and Specificity
  • Software*
  • Web Browser