Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species

PLoS Genet. 2022 Jan 10;18(1):e1009604. doi: 10.1371/journal.pgen.1009604. eCollection 2022 Jan.

Abstract

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Genetic
  • Genetics, Population
  • Genotyping Techniques / methods*
  • Humans
  • Logistic Models
  • Malaria / parasitology*
  • Microsatellite Repeats*
  • Plasmodium falciparum / genetics*
  • Plasmodium vivax / genetics*
  • Polymorphism, Single Nucleotide
  • Species Specificity
  • Whole Genome Sequencing