Multivariate functional partial least squares for classification using longitudinal data

Theor Biol Forum. 2021 Jan 1;114(1-2):75-88. doi: 10.19272/202111402007.

Abstract

The use of statistical methods to predict outcomes using high dimensional datasets in medicine is becoming increasingly popular for forecasting and monitoring patient health. Our work is motivated by a longitudinal dataset containing 1H NMR spectra of metabolites of 18 patients undergoing a kidney transplant alongside their graft outcomes that fall into one of three categories: acute rejection, delayed graft function and primary function. We proposed a functional partial least squares (FPLS) model that extends existing PLS methods for the analysis of longitudinally measured scalar omics datasets to the case of longitudinally measured functional datasets. We designed an iterative algorithm to link multiple time points, and then applied our proposed method to analyse the data from kidney transplant patients. Finally, we compared the AUC of our method to the AUC of the univariate methods which only use the information of one time-point information. It appeared that our method outperforms the existing methods. A simulation study was performed to mimic the kidney transplant dataset but with a larger sample size and different scenarios performed to evaluate the performance of the new method in larger datasets. We consider scenarios which vary in the difficulty to distinguish the two groups. It appeared that the three time-points model performs better than any of the individual models with average AUCs of 0.909 and 0.811 respectively.

Keywords: Classification; Functional Data; Longitudinal Data; Partial Least Squares; Pediction.

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Humans
  • Least-Squares Analysis
  • Proton Magnetic Resonance Spectroscopy
  • Sample Size