Microbiome depiction through user-adapted bioinformatic pipelines and parameters

J Med Microbiol. 2023 Oct;72(10). doi: 10.1099/jmm.0.001756.

Abstract

Introduction. The role of the microbiome in health and disease continues to be increasingly recognized. However, there is significant variability in the bioinformatic protocols for analysing genomic data. This, in part, has impeded the potential incorporation of microbiomics into the clinical setting and has challenged interstudy reproducibility. In microbial compositional analysis, there is a growing recognition for the need to move away from a one-size-fits-all approach to data processing.Gap Statement. Few evidence-based recommendations exist for setting parameters of programs that infer microbiota community profiles despite these parameters significantly impacting the accuracy of taxonomic inference.Aim. To compare three commonly used programs (DADA2, QIIME2, and mothur) and optimize them into four user-adapted pipelines for processing paired-end amplicon reads. We aim to increase the accuracy of compositional inference and help standardize microbiomic protocol.Methods. Two key parameters were isolated across four pipelines: filtering sequence reads based on a whole-number error threshold (maxEE) and truncating read ends based on a quality score threshold (QTrim). Closeness of sample inference was then evaluated using a mock community of known composition.Results. We observed that raw genomic data lost were proportionate to how stringently parameters were set. Exactly how much data were lost varied by pipeline. Accuracy of sample inference correlated with increased sequence read retention. Falsely detected taxa and unaccounted for microbial constituents were unique to pipeline and parameter. Implementation of optimized parameter values led to better approximation of the known mock community.Conclusions. Microbial compositions generated based on the 16S rRNA marker gene should be interpreted with caution. To improve microbial community profiling, bioinformatic protocols must be user-adapted. Analysis should be performed with consideration for the select target amplicon, pipelines and parameters used, and taxa of interest.

Keywords: ASVs; ESVs; OTUs; UniFrac distance; amplicon sequencing; zOTUs.

MeSH terms

  • Computational Biology / methods
  • Genomics
  • High-Throughput Nucleotide Sequencing / methods
  • Microbiota*
  • RNA, Ribosomal, 16S / genetics
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods

Substances

  • RNA, Ribosomal, 16S