A fast algorithm for Bayesian multi-locus model in genome-wide association studies

Mol Genet Genomics. 2017 Aug;292(4):923-934. doi: 10.1007/s00438-017-1322-4. Epub 2017 May 22.

Abstract

Genome-wide association studies (GWAS) have identified a large amount of single-nucleotide polymorphisms (SNPs) associated with complex traits. A recently developed linear mixed model for estimating heritability by simultaneously fitting all SNPs suggests that common variants can explain a substantial fraction of heritability, which hints at the low power of single variant analysis typically used in GWAS. Consequently, many multi-locus shrinkage models have been proposed under a Bayesian framework. However, most use Markov Chain Monte Carlo (MCMC) algorithm, which are time-consuming and challenging to apply to GWAS data. Here, we propose a fast algorithm of Bayesian adaptive lasso using variational inference (BAL-VI). Extensive simulations and real data analysis indicate that our model outperforms the well-known Bayesian lasso and Bayesian adaptive lasso models in accuracy and speed. BAL-VI can complete a simultaneous analysis of a lung cancer GWAS data with ~3400 subjects and ~570,000 SNPs in about half a day.

Keywords: Bayesian adaptive lasso; Genome-wide association studies; Multi-locus model; Variable selection; Variational inference.

MeSH terms

  • Algorithms*
  • Bayes Theorem*
  • Computational Biology / methods*
  • Computer Simulation
  • Genome-Wide Association Study / methods*
  • Humans
  • Markov Chains
  • Models, Genetic
  • Monte Carlo Method
  • Polymorphism, Single Nucleotide / genetics*
  • Quantitative Trait, Heritable*