Rare variant testing of imputed data: an analysis pipeline typified

Hum Hered. 2014;78(3-4):164-78. doi: 10.1159/000368676. Epub 2014 Dec 10.

Abstract

Important methodological advancements in rare variant association testing have been made recently, among them collapsing tests, kernel methods and the variable threshold (VT) technique. Typically, rare variants from a region of interest are tested for association as a group ('bin'). Rare variant studies are already routinely performed as whole-exome sequencing studies. As an alternative approach, we propose a pipeline for rare variant analysis of imputed data and develop respective quality control criteria. We provide suggestions for the choice and construction of analysis bins in whole-genome application and support the analysis with implementations of standard burden tests (COLL, CMAT) in our INTERSNP-RARE software. In addition, three rare variant regression tests (REG, FRACREG and COLLREG) are implemented. All tests are accompanied with the VT approach which optimizes the definition of 'rareness'. We integrate kernel tests as implemented in SKAT/SKAT-O into the suggested strategies. Then, we apply our analysis scheme to a genome-wide association study of Alzheimer's disease. Further, we show that our pipeline leads to valid significance testing procedures with controlled type I error rates. Strong association signals surrounding the known APOE locus demonstrate statistical power. In addition, we highlight several suggestive rare variant association findings for follow-up studies, including genomic regions overlapping MCPH1, MED18 and NOTCH3. In summary, we describe and support a straightforward and cost-efficient rare variant analysis pipeline for imputed data and demonstrate its feasibility and validity. The strategy can complement rare variant studies with next generation sequencing data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alzheimer Disease / epidemiology
  • Alzheimer Disease / genetics*
  • Case-Control Studies
  • Genetic Variation*
  • Genome, Human
  • Genome-Wide Association Study / statistics & numerical data*
  • Genotype
  • Germany / epidemiology
  • Humans
  • Models, Statistical*
  • Regression Analysis
  • Software