Identification of grouped rare and common variants via penalized logistic regression

Genet Epidemiol. 2013 Sep;37(6):592-602. doi: 10.1002/gepi.21746. Epub 2013 Jul 8.

Abstract

In spite of the success of genome-wide association studies in finding many common variants associated with disease, these variants seem to explain only a small proportion of the estimated heritability. Data collection has turned toward exome and whole genome sequencing, but it is well known that single marker methods frequently used for common variants have low power to detect rare variants associated with disease, even with very large sample sizes. In response, a variety of methods have been developed that attempt to cluster rare variants so that they may gather strength from one another under the premise that there may be multiple causal variants within a gene. Most of these methods group variants by gene or proximity, and test one gene or marker window at a time. We propose a penalized regression method (PeRC) that analyzes all genes at once, allowing grouping of all (rare and common) variants within a gene, along with subgrouping of the rare variants, thus borrowing strength from both rare and common variants within the same gene. The method can incorporate either a burden-based weighting of the rare variants or one in which the weights are data driven. In simulations, our method performs favorably when compared to many previously proposed approaches, including its predecessor, the sparse group lasso [Friedman et al., 2010].

Keywords: association analysis; elastic net; lasso; penalized likelihood; rare variants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Gene Frequency
  • Genetic Variation*
  • Genetics, Population
  • Humans
  • Logistic Models*
  • Models, Genetic*
  • Polymorphism, Single Nucleotide