BG2: Bayesian variable selection in generalized linear mixed models with nonlocal priors for non-Gaussian GWAS data

BMC Bioinformatics. 2023 Sep 15;24(1):343. doi: 10.1186/s12859-023-05468-w.

Abstract

Background: Genome-wide association studies (GWASes) aim to identify single nucleotide polymorphisms (SNPs) associated with a given phenotype. A common approach for the analysis of GWAS is single marker analysis (SMA) based on linear mixed models (LMMs). However, LMM-based SMA usually yields a large number of false discoveries and cannot be directly applied to non-Gaussian phenotypes such as count data.

Results: We present a novel Bayesian method to find SNPs associated with non-Gaussian phenotypes. To that end, we use generalized linear mixed models (GLMMs) and, thus, call our method Bayesian GLMMs for GWAS (BG2). To deal with the high dimensionality of GWAS analysis, we propose novel nonlocal priors specifically tailored for GLMMs. In addition, we develop related fast approximate Bayesian computations. BG2 uses a two-step procedure: first, BG2 screens for candidate SNPs; second, BG2 performs model selection that considers all screened candidate SNPs as possible regressors. A simulation study shows favorable performance of BG2 when compared to GLMM-based SMA. We illustrate the usefulness and flexibility of BG2 with three case studies on cocaine dependence (binary data), alcohol consumption (count data), and number of root-like structures in a model plant (count data).

Keywords: Bayesian statistics; GLMM; GWAS; Nonlocal prior; Variable selection.

MeSH terms

  • Bayes Theorem
  • Computer Simulation
  • Genome-Wide Association Study*
  • Linear Models
  • Phenotype