Efficiently identifying significant associations in genome-wide association studies

Emrah Kostem; Eleazar Eskin

doi:10.1089/cmb.2013.0087

Efficiently identifying significant associations in genome-wide association studies

J Comput Biol. 2013 Oct;20(10):817-30. doi: 10.1089/cmb.2013.0087. Epub 2013 Sep 14.

Authors

Emrah Kostem¹, Eleazar Eskin

Affiliation

¹ 1 Computer Science Department, University of California , Los Angeles, California.

Abstract

Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Computer Simulation
Gene Frequency
Genome, Human
Genome-Wide Association Study / methods*
Haplotypes
Human Genome Project
Humans
Models, Genetic
Polymorphism, Single Nucleotide

Abstract

Publication types

MeSH terms

Grants and funding