High-throughput identification of informative nuclear loci for shallow-scale phylogenetics and phylogeography

Syst Biol. 2012 Oct;61(5):745-61. doi: 10.1093/sysbio/sys051. Epub 2012 May 18.

Abstract

One of the major challenges for researchers studying phylogeography and shallow-scale phylogenetics is the identification of highly variable and informative nuclear loci for the question of interest. Previous approaches to locus identification have generally required extensive testing of anonymous nuclear loci developed from genomic libraries of the target taxon, testing of loci of unknown utility from other systems, or identification of loci from the nearest model organism with genomic resources. Here, we present a fast and economical approach to generating thousands of variable, single-copy nuclear loci for any system using next-generation sequencing. We performed Illumina paired-end sequencing of three reduced-representation libraries (RRLs) in chorus frogs (Pseudacris) to identify orthologous, single-copy loci across libraries and to estimate sequence divergence at multiple taxonomic levels. We also conducted PCR testing of these loci across the genus Pseudacris and outgroups to determine whether loci developed for phylogeography can be extended to deeper phylogenetic levels. Prior to sequencing, we conducted in silico digestion of the most closely related reference genome (Xenopus tropicalis) to generate expectations for the number of loci and degree of coverage for a particular experimental design. Using the RRL approach, we: (i) identified more than 100,000 single-copy nuclear loci, 6339 of which were obtained for divergent conspecifics and 904 of which were obtained for heterospecifics; (ii) estimated average nuclear sequence divergence at 0.1% between alleles within an individual, 1.1% between conspecific individuals that represent two different clades, and 1.8% between species; and (iii) determined from PCR testing that 53% of the loci successfully amplify within-species and also many amplify to the genus-level and deeper in the phylogeny (16%). Our study effectively identified nuclear loci present in the genome that have levels of sequence divergence on par with mitochondrial loci commonly used in phylogeography. Specifically, we estimated that ~7% of loci in the chorus frog genome are >3% divergent within species; this translates to a prediction of approximately 50,000 single-copy loci in the genome with >3% divergence. Moreover, successful amplification of many loci at deeper phylogenetic levels indicates that the RRL approach represents an efficient method for rapid identification of informative loci for both phylogenetics and phylogeography. We conclude by making recommendations for minimizing the cost and maximizing the efficiency of locus identification for future studies in this field.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Anura / classification
  • Anura / genetics*
  • Cell Nucleus / genetics
  • Computer Simulation
  • Evolution, Molecular*
  • High-Throughput Nucleotide Sequencing / economics
  • High-Throughput Nucleotide Sequencing / methods*
  • Phylogeny*
  • Phylogeography / methods*
  • Polymerase Chain Reaction
  • Sequence Analysis, DNA / economics
  • Sequence Analysis, DNA / methods*