Enabling genome-wide association testing with multiple diseases and no healthy controls

Gene. 2019 Feb 5:684:118-123. doi: 10.1016/j.gene.2018.10.047. Epub 2018 Oct 23.

Abstract

Motivation: While large-scale whole genome sequencing is feasible the high costs compel investigators to focus on disease subjects. As a result large sequencing datasets of samples with different diseases are often readily available, but not healthy controls to contrast them with. While it is possible to perform an association study using only diseases, the associations could be driven by a disease acting as a control and not the focal disease.

Methods: We developed a genotype-on-phenotype reverse regression with a Bayesian spike and slab prior to enable association testing in datasets with multiple diseases. This method, referred to as revreg, flagged associations (both common and rare) that were driven by diseases that were not of primary interest.

Results: Based on simulations, revreg had 80% power to detect an odds ratio of 1.74 for common variants (3500 samples total) and 3.73 for rare variants (14,000 samples total), with minimal type I error. For common variants, we tested this method on 3657 whole genome sequenced samples aimed at discovering variants associated with disease risk of Chronic Obstructive Pulmonary Disease using three other diseases as controls. We demonstrated detection of six highly significant associations likely due to Age-Related Macular Degeneration. In an exome dataset of 8836 samples aimed at characterizing rare variants associated with disease risk of Asthma, using five other diseases as controls, we detected and removed genic regions due to AMD (C3, CFH, CFHR5, CFI, and DNMT3A) and RA (KRTAP13-4).

Keywords: Bayesian; Genetics; Reverse regression; Spike and slab prior.

MeSH terms

  • Asthma / genetics
  • Bayes Theorem
  • Case-Control Studies
  • Computer Simulation
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study / methods*
  • Humans
  • Macular Degeneration / genetics
  • Phenotype
  • Sequence Analysis, DNA / methods*
  • Whole Genome Sequencing / methods*