Epidemiological data analysis of viral quasispecies in the next-generation sequencing era

Brief Bioinform. 2021 Jan 18;22(1):96-108. doi: 10.1093/bib/bbaa101.

Abstract

The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.

Keywords: haplotype calling; next-generation sequencing; outbreak investigation; quasispecies; surveillance systems; variant calling.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Epidemiological Monitoring*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • RNA Virus Infections / epidemiology
  • RNA Virus Infections / virology*
  • RNA Viruses / classification
  • RNA Viruses / genetics*
  • RNA Viruses / isolation & purification
  • RNA Viruses / pathogenicity