Cross-site validation of lung cancer diagnosis by electronic nose with deep learning: a multicenter prospective study

Respir Res. 2024 May 10;25(1):203. doi: 10.1186/s12931-024-02840-z.

Abstract

Background: Although electronic nose (eNose) has been intensively investigated for diagnosing lung cancer, cross-site validation remains a major obstacle to be overcome and no studies have yet been performed.

Methods: Patients with lung cancer, as well as healthy control and diseased control groups, were prospectively recruited from two referral centers between 2019 and 2022. Deep learning models for detecting lung cancer with eNose breathprint were developed using training cohort from one site and then tested on cohort from the other site. Semi-Supervised Domain-Generalized (Semi-DG) Augmentation (SDA) and Noise-Shift Augmentation (NSA) methods with or without fine-tuning was applied to improve performance.

Results: In this study, 231 participants were enrolled, comprising a training/validation cohort of 168 individuals (90 with lung cancer, 16 healthy controls, and 62 diseased controls) and a test cohort of 63 individuals (28 with lung cancer, 10 healthy controls, and 25 diseased controls). The model has satisfactory results in the validation cohort from the same hospital while directly applying the trained model to the test cohort yielded suboptimal results (AUC, 0.61, 95% CI: 0.47─0.76). The performance improved after applying data augmentation methods in the training cohort (SDA, AUC: 0.89 [0.81─0.97]; NSA, AUC:0.90 [0.89─1.00]). Additionally, after applying fine-tuning methods, the performance further improved (SDA plus fine-tuning, AUC:0.95 [0.89─1.00]; NSA plus fine-tuning, AUC:0.95 [0.90─1.00]).

Conclusion: Our study revealed that deep learning models developed for eNose breathprint can achieve cross-site validation with data augmentation and fine-tuning. Accordingly, eNose breathprints emerge as a convenient, non-invasive, and potentially generalizable solution for lung cancer detection.

Clinical trial registration: This study is not a clinical trial and was therefore not registered.

Keywords: Breathprint; Cross-site validation; Data augmentation; Deep learning; Electronic nose; Lung cancer.

Publication types

  • Multicenter Study
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Breath Tests / methods
  • Deep Learning*
  • Electronic Nose*
  • Female
  • Humans
  • Lung Neoplasms* / diagnosis
  • Male
  • Middle Aged
  • Prospective Studies
  • Reproducibility of Results