Machine-Learning-Aided Prediction of Brain Metastases Development in Non-Small-Cell Lung Cancers

Clin Lung Cancer. 2023 Dec;24(8):e311-e322. doi: 10.1016/j.cllc.2023.08.002. Epub 2023 Aug 6.

Abstract

Purpose: Non-small-cell lung cancer (NSCLC) shows a high incidence of brain metastases (BM). Early detection is crucial to improve clinical prospects. We trained and validated classifier models to identify patients with a high risk of developing BM, as they could potentially benefit from surveillance brain MRI.

Methods: Consecutive patients with an initial diagnosis of NSCLC from January 2011 to April 2019 and an in-house chest-CT scan (staging) were retrospectively recruited at a German lung cancer center. Brain imaging was performed at initial diagnosis and in case of neurological symptoms (follow-up). Subjects lost to follow-up or still alive without BM at the data cut-off point (12/2020) were excluded. Covariates included clinical and/or 3D-radiomics-features of the primary tumor from staging chest-CT. Four machine learning models for prediction (80/20 training) were compared. Gini Importance and SHAP were used as measures of importance; sensitivity, specificity, area under the precision-recall curve, and Matthew's Correlation Coefficient as evaluation metrics.

Results: Three hundred and ninety-five patients compromised the clinical cohort. Predictive models based on clinical features offered the best performance (tuned to maximize recall: sensitivity∼70%, specificity∼60%). Radiomics features failed to provide sufficient information, likely due to the heterogeneity of imaging data. Adenocarcinoma histology, lymph node invasion, and histological tumor grade were positively correlated with the prediction of BM, age, and squamous cell carcinoma histology were negatively correlated. A subgroup discovery analysis identified 2 candidate patient subpopulations appearing to present a higher risk of BM (female patients + adenocarcinoma histology, adenocarcinoma patients + no other distant metastases).

Conclusion: Analysis of the importance of input features suggests that the models are learning the relevant relationships between clinical features/development of BM. A higher number of samples is to be prioritized to improve performance. Employed prospectively at initial diagnosis, such models can help select high-risk subgroups for surveillance brain MRI.

Keywords: Interpretable machine learning; NSCLC; Predictive models; Radiomics; Secondary brain cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma*
  • Brain Neoplasms* / diagnostic imaging
  • Brain Neoplasms* / secondary
  • Carcinoma, Non-Small-Cell Lung* / pathology
  • Female
  • Humans
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / pathology
  • Machine Learning
  • Retrospective Studies