Learning Gaussian Graphical Models from Correlated Data

bioRxiv [Preprint]. 2024 Apr 5:2024.04.03.587948. doi: 10.1101/2024.04.03.587948.

Abstract

Gaussian Graphical Models (GGM) have been widely used in biomedical research to explore complex relationships between many variables. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a Bootstrap algorithm to infer GGM from correlated data. We use extensive simulations of correlated data from family-based studies to show that the Bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from a family-based study known as the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well in this real example.

Publication types

  • Preprint