Sample size requirements to detect gene-environment interactions in genome-wide association studies

Genet Epidemiol. 2011 Apr;35(3):201-10. doi: 10.1002/gepi.20569. Epub 2011 Feb 9.

Abstract

Many complex diseases are likely to be a result of the interplay of genes and environmental exposures. The standard analysis in a genome-wide association study (GWAS) scans for main effects and ignores the potentially useful information in the available exposure data. Two recently proposed methods that exploit environmental exposure information involve a two-step analysis aimed at prioritizing the large number of SNPs tested to highlight those most likely to be involved in a GE interaction. For example, Murcray et al. ([2009] Am J Epidemiol 169:219–226) proposed screening on a test that models the G-E association induced by an interaction in the combined case-control sample. Alternatively, Kooperberg and LeBlanc ([2008] Genet Epidemiol 32:255–263) suggested screening on genetic marginal effects. In both methods, SNPs that pass the respective screening step at a pre-specified significance threshold are followed up with a formal test of interaction in the second step. We propose a hybrid method that combines these two screening approaches by allocating a proportion of the overall genomewide significance level to each test. We show that the Murcray et al. approach is often the most efficient method, but that the hybrid approach is a powerful and robust method for nearly any underlying model. As an example, for a GWAS of 1 million markers including a single true disease SNP with minor allele frequency of 0.15, and a binary exposure with prevalence 0.3, the Murcray, Kooperberg and hybrid methods are 1.90, 1.27, and 1.87 times as efficient, respectively, as the traditional case-control analysis to detect an interaction effect size of 2.0.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Comment

MeSH terms

  • Case-Control Studies
  • Disease / genetics
  • Environment
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Logistic Models
  • Models, Genetic
  • Molecular Epidemiology / statistics & numerical data
  • Polymorphism, Single Nucleotide
  • Sample Size
  • Software