Strategies to avoid wrongly labelled genomes using as example the detected wrong taxonomic affiliation for aeromonas genomes in the GenBank database

PLoS One. 2015 Jan 21;10(1):e0115813. doi: 10.1371/journal.pone.0115813. eCollection 2015.

Abstract

Around 27,000 prokaryote genomes are presently deposited in the Genome database of GenBank at the National Center for Biotechnology Information (NCBI) and this number is exponentially growing. However, it is not known how many of these genomes correspond correctly to their designated taxon. The taxonomic affiliation of 44 Aeromonas genomes (only five of these are type strains) deposited at the NCBI was determined by a multilocus phylogenetic analysis (MLPA) and by pairwise average nucleotide identity (ANI). Discordant results in relation to taxa assignation were found for 14 (35.9%) of the 39 non-type strain genomes on the basis of both the MLPA and ANI results. Data presented in this study also demonstrated that if the genome of the type strain is not available, a genome of the same species correctly identified can be used as a reference for ANI calculations. Of the three ANI calculating tools compared (ANI calculator, EzGenome and JSpecies), EzGenome and JSpecies provided very similar results. However, the ANI calculator provided higher intra- and inter-species values than the other two tools (differences within the ranges 0.06-0.82% and 0.92-3.38%, respectively). Nevertheless each of these tools produced the same species classification for the studied Aeromonas genomes. To avoid possible misinterpretations with the ANI calculator, particularly when values are at the borderline of the 95% cutoff, one of the other calculation tools (EzGenome or JSpecies) should be used in combination. It is recommended that once a genome sequence is obtained the correct taxonomic affiliation is verified using ANI or a MLPA before it is submitted to the NCBI and that researchers should amend the existing taxonomic errors present in databases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aeromonas / classification*
  • Aeromonas / genetics*
  • Databases, Nucleic Acid*
  • Genome, Bacterial*
  • Sequence Analysis, DNA*

Grants and funding

This work was supported in part by the project with reference AGL2011-30461-C02-02 by the “Ministerio de Ciencia e Innovación” (Spain) and by funding from the European Union Seventh Framework Programme (FP7/2007–2013) under grant agreement no. 311846. The authors are solely responsible for the content of this publication. It does not represent the opinion of the European Commission. The European Commission is not responsible for any use that might be made of data appearing therein. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.