Swarm v2: highly-scalable and high-resolution amplicon clustering

PeerJ. 2015 Dec 10:3:e1420. doi: 10.7717/peerj.1420. eCollection 2015.

Abstract

Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.

Keywords: Barcoding; Environmental diversity; Molecular operational taxonomic units.

Grants and funding

FM and MD were supported by the Deutsche Forschungsgemeinschaft (grant #DU1319/1-1). CQ is funded by an EPSRC Career Acceleration Fellowship—EP/H003851/1. CdeV were supported by the EU EraNet BiodivErsA program BioMarKs (grant #2008-6530) and the French government “Investissements d’Avenir” project OCEANOMICS (ANR-11-BTBR-0008) and the EU FP7 program MicroB3 (contract number 287589). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.