Efficient identification of context dependent subgroups of risk from genome-wide association studies

Stat Appl Genet Mol Biol. 2014 Apr 1;13(2):217-26. doi: 10.1515/sagmb-2013-0062.

Abstract

We have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (>500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are under-estimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome-wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Genetic Variation*
  • Genome, Human
  • Genome-Wide Association Study*
  • Genomics / methods
  • Humans
  • Models, Genetic*
  • Risk Factors