Ancestry estimation and control of population stratification for sequence-based association studies

Nat Genet. 2014 Apr;46(4):409-15. doi: 10.1038/ng.2924. Epub 2014 Mar 16.

Abstract

Estimating individual ancestry is important in genetic association studies where population structure leads to false positive signals, although assigning ancestry remains challenging with targeted sequence data. We propose a new method for the accurate estimation of individual genetic ancestry, based on direct analysis of off-target sequence reads, and implement our method in the publicly available LASER software. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001×. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1×. On an even finer scale, the method improves discrimination between exome-sequenced study participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and to reduce the risk of spurious findings due to population structure.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Validation Study

MeSH terms

  • Base Sequence
  • Computer Simulation
  • Genetic Association Studies / methods*
  • Genetics, Population / methods*
  • Models, Genetic*
  • Molecular Sequence Data
  • Polymorphism, Single Nucleotide / genetics
  • Principal Component Analysis
  • Sequence Analysis, DNA
  • Software*