Reduced representation approaches produce similar results to whole genome sequencing for some common phylogeographic analyses

PLoS One. 2023 Nov 30;18(11):e0291941. doi: 10.1371/journal.pone.0291941. eCollection 2023.

Abstract

When designing phylogeographic investigations researchers can choose to collect many different types of molecular markers, including mitochondrial genes or genomes, SNPs from reduced representation protocols, large sequence capture data sets, and even whole genomes. Given that the statistical power and accuracy of various analyses are expected to differ depending on both the type of marker and the amount of data collected, an exploration of the variance across methodological results as a function of marker type should provide valuable information to researchers. Here we collect mitochondrial Cytochrome b sequences, whole mitochondrial genomes, single nucleotide polymorphisms (SNP)s isolated using a genotype by sequencing (GBS) protocol, sequences from ultraconserved elements, and low-coverage nuclear genomes from the North American water vole (Microtus richardsoni). We estimate genetic distances, population genetic structure, and historical demography using data from each of these datasets and compare the results across markers. As anticipated, the results exhibit differences across marker types, particularly in terms of the resolution offered by different analyses. A cost-benefit analysis indicates that SNPs collected using a GBS protocol are the most cost-effective molecular marker, with inferences that mirror those collected from the whole genome data at a fraction of the cost per sample.

MeSH terms

  • Genome* / genetics
  • Genotype
  • High-Throughput Nucleotide Sequencing / methods
  • Polymorphism, Single Nucleotide*
  • Whole Genome Sequencing