The impact of library preparation protocols on the consistency of allele frequency estimates in Pool-Seq data

Mol Ecol Resour. 2016 Jan;16(1):118-22. doi: 10.1111/1755-0998.12432. Epub 2015 Jun 9.

Abstract

Sequencing pools of individuals (Pool-Seq) is a cost-effective method to determine genome-wide allele frequency estimates. Given the importance of meta-analyses combining data sets, we determined the influence of different genomic library preparation protocols on the consistency of allele frequency estimates. We found that typically no more than 1% of the variation in allele frequency estimates could be attributed to differences in library preparation. Also read length had only a minor effect on the consistency of allele frequency estimates. By far, the most pronounced influence could be attributed to sequence coverage. Increasing the coverage from 30- to 50-fold improved the consistency of allele frequency estimates by at least 27%. We conclude that Pool-Seq data can be easily combined across different library preparation methods, but sufficient sequence coverage is key to reliable results.

Keywords: Drosophila; NGS libraries; Pool-Seq; population genetics-empirical.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Drosophila / genetics*
  • Gene Frequency
  • Gene Library*
  • Genotype
  • High-Throughput Nucleotide Sequencing

Associated data

  • GENBANK/ERR557048
  • GENBANK/ERR557049
  • GENBANK/ERR557050
  • GENBANK/ERR557051
  • GENBANK/ERR557052
  • GENBANK/ERR557053
  • GENBANK/ERR557054
  • GENBANK/ERR557055
  • GENBANK/ERR557056
  • GENBANK/ERR557057
  • GENBANK/ERR557058
  • GENBANK/ERR557059
  • GENBANK/ERR557060
  • GENBANK/ERR557061
  • GENBANK/ERR557062
  • GENBANK/ERR557063
  • GENBANK/ERR557064
  • GENBANK/ERR557065
  • GENBANK/ERR557066
  • GENBANK/ERR557067
  • GENBANK/ERR832532
  • GENBANK/ERR832533
  • GENBANK/ERR832534
  • GENBANK/ERR832535
  • Dryad/10.5061/dryad.P31J8