Bayesian multitask learning regression for heterogeneous patient cohorts

Andre Goncalves; Priyadip Ray; Braden Soper; David Widemann; Mari Nygård; Jan F Nygård; Ana Paula Sales

doi:10.1016/j.yjbinx.2019.100059

Bayesian multitask learning regression for heterogeneous patient cohorts

J Biomed Inform. 2019:100S:100059. doi: 10.1016/j.yjbinx.2019.100059. Epub 2019 Oct 18.

Authors

Andre Goncalves¹, Priyadip Ray², Braden Soper², David Widemann², Mari Nygård³, Jan F Nygård³, Ana Paula Sales²

Affiliations

¹ Lawrence Livermore National Laboratory, Livermore, CA, USA. Electronic address: goncalves1@llnl.gov.
² Lawrence Livermore National Laboratory, Livermore, CA, USA.
³ Cancer Registry of Norway, Oslo, Norway.

PMID: 34384572
DOI: 10.1016/j.yjbinx.2019.100059

Abstract

Multitask learning (MTL) leverages commonalities across related tasks with the aim of improving individual task performance. A key modeling choice in designing MTL models is the structure of the tasks' relatedness, which may not be known. Here we propose a Bayesian multitask learning model that is able to infer the task relationship structure directly from the data. We present two variations of the model in terms of a priori information of task relatedness. First, a diffuse Wishart prior is placed on a task precision matrix so that all tasks are assumed to be equally related a priori. Second, a Bayesian graphical LASSO prior is used on the task precision matrix to impose sparsity in the task relatedness. Motivated by machine learning applications in the biomedical domain, we emphasize interpretability and uncertainty quantification in our models. To encourage model interpretability, linear mappings from the shared input spaces to task-dependent output spaces are used. To encourage uncertainty quantification, conjugate priors are used so that full posterior inference is possible. Using synthetic data, we show that our model is able to recover the underlying task relationships as well as features jointly relevant for all tasks. We demonstrate the utility of our model on three distinct biomedical applications: Alzheimer's disease progression, Parkinson's disease assessment, and cervical cancer screening compliance. We show that our model outperforms Single Task (STL) models in terms of predictive performance, and performs better than existing MTL methods for the majority of the scenarios.

Keywords: Alzheimer’s disease progression; Bayesian modeling; Biomedical application; Multitask learning; Structured learning; Uncertainty quantification.