Rare variant association testing under low-coverage sequencing

Genetics. 2013 Jul;194(3):769-79. doi: 10.1534/genetics.113.150169. Epub 2013 May 1.

Abstract

Deep sequencing technologies enable the study of the effects of rare variants in disease risk. While methods have been developed to increase statistical power for detection of such effects, detecting subtle associations requires studies with hundreds or thousands of individuals, which is prohibitively costly. Recently, low-coverage sequencing has been shown to effectively reduce the cost of genome-wide association studies, using current sequencing technologies. However, current methods for disease association testing on rare variants cannot be applied directly to low-coverage sequencing data, as they require individual genotype data, which may not be called correctly due to low-coverage and inherent sequencing errors. In this article, we propose two novel methods for detecting association of rare variants with disease risk, using low coverage, error-prone sequencing. We show by simulation that our methods outperform previous methods under both low- and high-coverage sequencing and under different disease architectures. We use real data and simulation studies to demonstrate that to maximize the power to detect associations for a fixed budget, it is desirable to include more samples while lowering coverage and to perform an analysis using our suggested methods.

Keywords: association test; low-coverage sequencing; pooling; rare variants; sequencing.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Gene Frequency*
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study
  • Genotype
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*
  • Sequence Analysis, DNA