Assessment of significance of conditionally independent GWAS signals

Bioinformatics. 2021 Oct 25;37(20):3521-3529. doi: 10.1093/bioinformatics/btab332.

Abstract

Motivation: Multiple independently associated SNPs within a linkage disequilibrium region are a common phenomenon. Conditional analysis has been successful in identifying secondary signals. While conditional association tests are limited to specific genomic regions, they are benchmarked with genome-wide scale criterion, a conservative strategy. Within the weighted hypothesis testing framework, we developed a 'quasi-adaptive' method that uses the pairwise correlation (r2) and physical distance (d) from the index association to construct priority functions G =G(r2, d), which assign an SNP-specific α-threshold to each SNP. Family-wise error rate (FWER) and power of the approach were evaluated via simulations based on real GWAS data. We compared a series of different G-functions.

Results: Simulations under the null hypothesis on 1,100 primary SNPs confirmed appropriate empirical FWER for all G-functions. A G-function with optimal r2 = 0.3 between index and secondary SNP which down-weighted SNPs at higher distance step-wise-strong and gave more emphasis on d than on r2 had overall best power. It also gave the best results in application to the real datasets. As a proof of concept, 'quasi-adaptive' method was applied to GWAS on free thyroxine (FT4), inflammatory bowel disease (IBD) and human height. Application of the algorithm revealed 5 secondary signals in our example GWAS on FT4, 5 secondary signals in case of the IBD and 19 secondary signals on human height, that would have gone undetected with the established genome-wide threshold (α=5×10-8).

Availability and implementation: https://github.com/sghasemi64/Secondary-Signal.

Supplementary information: Supplementary data are available at Bioinformatics online.