A novel machine learning prediction model for metastasis in breast cancer

Cancer Rep (Hoboken). 2024 Mar;7(3):e2006. doi: 10.1002/cnr2.2006.

Abstract

Background: Breast cancer (BC) metastasis is the common cause of high mortality. Conventional prognostic criteria cannot accurately predict the BC metastasis risk. The machine learning technologies can overcome the disadvantage of conventional models.

Aim: We developed a model to predict BC metastasis using the random survival forest (RSF) method.

Methods: Based on demographic data and routine clinical data, we used RSF-recursive feature elimination to identify the predictive variables and developed a model to predict metastasis using RSF method. The area under the receiver operating characteristic curve (AUROC) and Kaplan-Meier survival (KM) analyses were plotted to validate the predictive effect when C-index was plotted to assess the discrimination and Brier scores was plotted to assess the calibration of the predictive model.

Results: We developed a metastasis prediction model comprising three variables (pathological stage, aspartate aminotransferase, and neutrophil count) selected by RSF-recursive feature elimination. The model was reliable and stable when assessed by the AUROC (0.932 in training set and 0.905 in validation set) and KM survival analyses (p < .0001). The C-indexes (0.959) and Brier score (0.097) also validated the good predictive ability of this model.

Conclusions: This model relies on routine data and examination indicators in real-time clinical practice and exhibits an accurate prediction performance without increasing the cost for patients. Using this model, clinicians can facilitate risk communication and provide precise and efficient individualized therapy to patients with breast cancer.

Keywords: breast cancer; metastasis; predictive model; random survival forest; recursive feature elimination.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Breast Neoplasms* / diagnosis
  • Breast Neoplasms* / therapy
  • Communication
  • Female
  • Humans
  • Leukocyte Count
  • Machine Learning
  • Neoplasms, Second Primary*