Assessing the reliability of eBURST using simulated populations with known ancestry

BMC Microbiol. 2007 Apr 12:7:30. doi: 10.1186/1471-2180-7-30.

Abstract

Background: The program eBURST uses multilocus sequence typing data to divide bacterial populations into groups of closely related strains (clonal complexes), predicts the founding genotype of each group, and displays the patterns of recent evolutionary descent of all other strains in the group from the founder. The reliability of eBURST was evaluated using populations simulated with different levels of recombination in which the ancestry of all strains was known.

Results: For strictly clonal simulations, where all allelic change is due to point mutation, the groups of related strains identified by eBURST were very similar to those expected from the true ancestry and most of the true ancestor-descendant relationships (90-98%) were identified by eBURST. Populations simulated with low or moderate levels of recombination showed similarly high performance but the reliability of eBURST declined with increasing recombination to mutation ratio. Populations simulated under a high recombination to mutation ratio were dominated by a single large straggly eBURST group, which resulted from the incorrect linking of unrelated groups of strains into the same eBURST group. The reliability of the ancestor-descendant links in eBURST diagrams was related to the proportion of strains in the largest eBURST group, which provides a useful guide to when eBURST is likely to be unreliable.

Conclusion: Examination of eBURST groups within populations of a range of bacterial species showed that most were within the range in which eBURST is reliable, and only a small number (e.g. Burkholderia pseudomallei and Enterococcus faecium) appeared to have such high rates of recombination that eBURST is likely to be unreliable. The study also demonstrates how three simple tests in eBURST v3 can be used to detect unreliable eBURST performance and recognise populations in which there appears to be a high rate of recombination relative to mutation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / classification*
  • Bacteria / genetics*
  • Computer Simulation*
  • Genotype
  • Models, Genetic
  • Mutation / genetics
  • Phylogeny*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software*