Fitting the Cox proportional hazards model to big data

Biometrics. 2024 Jan 29;80(1):ujae018. doi: 10.1093/biomtc/ujae018.

Abstract

The semiparametric Cox proportional hazards model, together with the partial likelihood principle, has been widely used to study the effects of potentially time-dependent covariates on a possibly censored event time. We propose a computationally efficient method for fitting the Cox model to big data involving millions of study subjects. Specifically, we perform maximum partial likelihood estimation on a small subset of the whole data and improve the initial estimator by incorporating the remaining data through one-step estimation with estimated efficient score functions. We show that the final estimator has the same asymptotic distribution as the conventional maximum partial likelihood estimator using the whole dataset but requires only a small fraction of computation time. We demonstrate the usefulness of the proposed method through extensive simulation studies and an application to the UK Biobank data.

Keywords: censoring; efficient score; one-step estimation; partial likelihood; time complexity; time-dependent covariates.

MeSH terms

  • Big Data*
  • Computer Simulation
  • Humans
  • Probability
  • Proportional Hazards Models
  • UK Biobank*