Efficient Algorithms and Implementation of a Semiparametric Joint Model for Longitudinal and Competing Risk Data: With Applications to Massive Biobank Data

Comput Math Methods Med. 2022 Feb 8:2022:1362913. doi: 10.1155/2022/1362913. eCollection 2022.

Abstract

Semiparametric joint models of longitudinal and competing risk data are computationally costly, and their current implementations do not scale well to massive biobank data. This paper identifies and addresses some key computational barriers in a semiparametric joint model for longitudinal and competing risk survival data. By developing and implementing customized linear scan algorithms, we reduce the computational complexities from O(n 2) or O(n 3) to O(n) in various steps including numerical integration, risk set calculation, and standard error estimation, where n is the number of subjects. Using both simulated and real-world biobank data, we demonstrate that these linear scan algorithms can speed up the existing methods by a factor of up to hundreds of thousands when n > 104, often reducing the runtime from days to minutes. We have developed an R package, FastJM, based on the proposed algorithms for joint modeling of longitudinal and competing risk time-to-event data and made it publicly available on the Comprehensive R Archive Network (CRAN).

MeSH terms

  • Algorithms*
  • Biological Specimen Banks / statistics & numerical data*
  • Bronchodilator Agents / therapeutic use
  • Computational Biology
  • Computer Simulation
  • Data Interpretation, Statistical
  • Disease Progression
  • Humans
  • Longitudinal Studies
  • Models, Statistical*
  • Primary Health Care / statistics & numerical data
  • Pulmonary Disease, Chronic Obstructive / physiopathology
  • Pulmonary Disease, Chronic Obstructive / therapy
  • Risk Assessment
  • Smoking Cessation / statistics & numerical data
  • Software

Substances

  • Bronchodilator Agents