Joint Estimation of Contamination, Error and Demography for Nuclear DNA from Ancient Humans

PLoS Genet. 2016 Apr 6;12(4):e1005972. doi: 10.1371/journal.pgen.1005972. eCollection 2016 Apr.

Abstract

When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters-including drift times and admixture rates-for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called 'Demographic Inference with Contamination and Error' (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Computer Simulation
  • DNA / genetics*
  • DNA Contamination*
  • DNA, Mitochondrial / genetics
  • Fossils
  • Genetic Drift*
  • Genetics, Population
  • Humans
  • Markov Chains
  • Monte Carlo Method
  • Neanderthals / genetics*
  • Sequence Analysis, DNA
  • Software

Substances

  • DNA, Mitochondrial
  • DNA