Haplotype synthesis analysis reveals functional variants underlying known genome-wide associated susceptibility loci

Bioinformatics. 2016 Jul 15;32(14):2136-42. doi: 10.1093/bioinformatics/btw125. Epub 2016 Mar 21.

Abstract

Motivation: The functional mechanisms underlying disease association remain unknown for Genome-wide Association Studies (GWAS) susceptibility variants located outside coding regions. Synthesis of effects from multiple surrounding functional variants has been suggested as an explanation of hard-to-interpret findings. We define filter criteria based on linkage disequilibrium measures and allele frequencies which reflect expected properties of synthesizing variant sets. For eligible candidate sets, we search for haplotype markers that are highly correlated with associated variants.

Results: Via simulations we assess the performance of our approach and suggest parameter settings which guarantee 95% sensitivity at 20-fold reduced computational cost. We apply our method to 1000 Genomes data and confirmed Crohn's Disease (CD) and Type 2 Diabetes (T2D) variants. A proportion of 36.9% allowed explanation by three-variant-haplotypes carrying at least two functional variants, as compared to 16.4% for random variants ([Formula: see text]). Association could be explained by missense variants for MUC19, PER3 (CD) and HMG20A (T2D). In a CD GWAS-imputed using haplotype reference consortium data (64 976 haplotypes)-we could confirm the syntheses of MUC19 and PER3 and identified synthesis by missense variants for 6 further genes (ZGPAZ, GPR65, CLN3/NPIPB8, LOC102723878, rs2872507, GCKR). In all instances, the odds ratios of the synthesizing haplotypes were virtually identical to that of the index SNP. In summary, we demonstrate the potential of synthesis analysis to guide functional follow-up of GWAS findings.

Availability and implementation: All methods are implemented in the C/C ++ toolkit GetSynth, available at http://sourceforge.net/projects/getsynth/

Contact: tim.becker@uni-greifswald.de

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Computational Biology / methods*
  • Diabetes Mellitus, Type 2 / genetics
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study*
  • Haplotypes*
  • Humans
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide