Regression trees and ensembles for cumulative incidence functions

Int J Biostat. 2022 Mar 25;18(2):397-419. doi: 10.1515/ijb-2021-0014. eCollection 2022 Nov 1.

Abstract

The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past two decades. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods use augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.

Keywords: Brier score; CART; Fine and Gray model; cause-specific hazard; competing risks; random forests.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Incidence
  • Machine Learning*