A Simple Scalable Association Hypothesis Test Combining Gene-wide Evidence From Multiple Polymorphisms

Br J Med Med Res. 2014 Mar;4(6):1413-1422. doi: 10.9734/bjmmr/2014/6117.

Abstract

Aims: In single-nucleotide polymorphism (SNP) scans, SNP-phenotype association hypotheses are tested, however there is biological interpretation only for genes that span multiple SNPs. We demonstrate and validate a method of combining gene-wide evidence using data for high-density lipoprotein cholesterol (HDLC).

Methodology: In a family based study (N=1782 from 482 families), we used 1000 phenotype-permuted datasets to determine the correlation of z-test statistics for 592 SNP-HDLC association tests comprising 14 genes previously reported to be associated with HDLC. We generated gene-wide p-values using the distribution of the sum of correlated z-statistics.

Results: Of the 14 genes, CETP was significant (p=4.0×10-5 <0.05/14), while PLTP was significant at the borderline (p=6.7×10-3 <0.1/14). These p-values were confirmed using empirical distributions of the sum of χ2 association statistics as a gold standard (2.9×10-6 and 1.8×10-3, respectively). Genewide p-values were more significant than Bonferroni-corrected p-value for the most significant SNP in 11 of 14 genes (p=0.023). Genewide p-values calculated from SNP correlations derived for 20 simulated normally distributed phenotypes reproduced those derived from the 1000 phenotype-permuted datasets were correlated with the empirical distributions (Spearman correlation = 0.92 for both).

Conclusion: We have validated a simple scalable method to combine polymorphism-level evidence into gene-wide statistical evidence. High-throughput gene-wide hypothesis tests may be used in biologically interpretable genomewide association scans. Genewide association tests may be used to meaningfully replicate findings in populations with different linkage disequilibrium structure, when SNP-level replication is not expected.

Keywords: Bonferroni; combining evidence; hypothesis tests.