Integrative genetic risk prediction using non-parametric empirical Bayes classification

Biometrics. 2017 Jun;73(2):582-592. doi: 10.1111/biom.12619. Epub 2016 Oct 28.

Abstract

Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa.

Keywords: Empirical Bayes; GWAS; Genetic risk prediction; High-dimensional classification; Integrative genomics; Non-parametric maximum likelihood.

MeSH terms

  • Algorithms
  • Bayes Theorem*
  • Biometry
  • Humans
  • Risk Factors
  • Sample Size