sEBM: Scaling Event Based Models to Predict Disease Progression via Implicit Biomarker Selection and Clustering

Inf Process Med Imaging. 2023 Jun:13939:208-221. doi: 10.1007/978-3-031-34048-2_17. Epub 2023 Jun 8.

Abstract

The Event Based Model (EBM) is a probabilistic generative model to explore biomarker changes occurring as a disease progresses. Disease progression is hypothesized to occur through a sequence of biomarker dysregulation "events". The EBM estimates the biomarker dysregulation event sequence. It computes the data likelihood for a given dysregulation sequence, and subsequently evaluates the posterior distribution on the dysregulation sequence. Since the posterior distribution is intractable, Markov Chain Monte-Carlo is employed to generate samples under the posterior distribution. However, the set of possible sequences increases as N! where N is the number of biomarkers (data dimension) and quickly becomes prohibitively large for effective sampling via MCMC. This work proposes the "scaled EBM" (sEBM) to enable event based modeling on large biomarker sets (e.g. high-dimensional data). First, sEBM implicitly selects a subset of biomarkers useful for modeling disease progression and infers the event sequence only for that subset. Second, sEBM clusters biomarkers with similar positions in the event sequence and only orders the "clusters", with each successive cluster corresponding to the next stage in disease progression. These two modifications used to construct the sEBM method provably reduces the possible space of event sequences by multiple orders of magnitude. The novel modifications are supported by theory and experiments on synthetic and real clinical data provides validation for sEBM to work in higher dimensional settings. Results on synthetic data with known ground truth shows that sEBM outperforms previous EBM variants as data dimensions increase. sEBM was successfully implemented with up to 300 biomarkers, which is a 6-fold increase over previous EBM applications. A real-world clinical application of sEBM is performed using 119 neuroimaging markers from publicly available Alzheimer's Disease Neuroimaging Initiative (ADNI) data to stratify subjects into 6 stages of disease progression. Subjects included cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer's Disease (AD). sEBM stage is differentiated for the 3 groups (χ2p-value<4.6e-32). Increased sEBM stage is a strong predictor of conversion risk to AD (p-value<2.3e-14) for MCI subjects, as verified with a Cox proportional-hazards model adjusted for age, sex, education and APOE4 status. Like EBM, sEBM does not rely on apriori defined diagnostic labels and only uses cross-sectional data.

Keywords: bayesian learning; biomarker clustering; disease progression modeling; prognostic biomarker selection.