Group association test using a hidden Markov model

Biostatistics. 2016 Apr;17(2):221-34. doi: 10.1093/biostatistics/kxv035. Epub 2015 Sep 28.

Abstract

In the genomic era, group association tests are of great interest. Due to the overwhelming number of individual genomic features, the power of testing for association of a single genomic feature at a time is often very small, as are the effect sizes for most features. Many methods have been proposed to test association of a trait with a group of features within a functional unit as a whole, e.g. all SNPs in a gene, yet few of these methods account for the fact that generally a substantial proportion of the features are not associated with the trait. In this paper, we propose to model the association for each feature in the group as a mixture of features with no association and features with non-zero associations to explicitly account for the possibility that a fraction of features may not be associated with the trait while other features in the group are. The feature-level associations are first estimated by generalized linear models; the sequence of these estimated associations is then modeled by a hidden Markov chain. To test for global association, we develop a modified likelihood ratio test based on a log-likelihood function that ignores higher order dependency plus a penalty term. We derive the asymptotic distribution of the likelihood ratio test under the null hypothesis. Furthermore, we obtain the posterior probability of association for each feature, which provides evidence of feature-level association and is useful for potential follow-up studies. In simulations and data application, we show that our proposed method performs well when compared with existing group association tests especially when there are only few features associated with the outcome.

Keywords: Finite mixture model; Genome-wide association study; Modified likelihood ratio test.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Interpretation, Statistical*
  • Genome-Wide Association Study / methods*
  • Humans
  • Likelihood Functions*
  • Markov Chains*