Cross-site validation of lung cancer diagnosis by electronic nose with deep learning: a multicenter prospective study

Meng-Rui Lee; Mu-Hsiang Kao; Ya-Chu Hsieh; Min Sun; Kea-Tiong Tang; Jann-Yuan Wang; Chao-Chi Ho; Jin-Yuan Shih; Chong-Jen Yu

doi:10.1186/s12931-024-02840-z

Cross-site validation of lung cancer diagnosis by electronic nose with deep learning: a multicenter prospective study

Respir Res. 2024 May 10;25(1):203. doi: 10.1186/s12931-024-02840-z.

Authors

Meng-Rui Lee^{1

2}, Mu-Hsiang Kao³, Ya-Chu Hsieh³, Min Sun⁴, Kea-Tiong Tang⁵, Jann-Yuan Wang¹, Chao-Chi Ho¹, Jin-Yuan Shih¹, Chong-Jen Yu^{1

2}

Affiliations

¹ Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan.
² Department of Internal Medicine, National Taiwan University Hospital Hsin-Chu Branch, Hsin-Chu, Taiwan.
³ Department. of Electrical Engineering, National Tsing Hua University, No. 101, Sec. 2, Kuang-Fu Road, Hsinchu, 30013, Taiwan.
⁴ Department. of Electrical Engineering, National Tsing Hua University, No. 101, Sec. 2, Kuang-Fu Road, Hsinchu, 30013, Taiwan. sunmin@ee.nthu.edu.tw.
⁵ Department. of Electrical Engineering, National Tsing Hua University, No. 101, Sec. 2, Kuang-Fu Road, Hsinchu, 30013, Taiwan. kttang@ee.nthu.edu.tw.

Abstract

Background: Although electronic nose (eNose) has been intensively investigated for diagnosing lung cancer, cross-site validation remains a major obstacle to be overcome and no studies have yet been performed.

Methods: Patients with lung cancer, as well as healthy control and diseased control groups, were prospectively recruited from two referral centers between 2019 and 2022. Deep learning models for detecting lung cancer with eNose breathprint were developed using training cohort from one site and then tested on cohort from the other site. Semi-Supervised Domain-Generalized (Semi-DG) Augmentation (SDA) and Noise-Shift Augmentation (NSA) methods with or without fine-tuning was applied to improve performance.

Results: In this study, 231 participants were enrolled, comprising a training/validation cohort of 168 individuals (90 with lung cancer, 16 healthy controls, and 62 diseased controls) and a test cohort of 63 individuals (28 with lung cancer, 10 healthy controls, and 25 diseased controls). The model has satisfactory results in the validation cohort from the same hospital while directly applying the trained model to the test cohort yielded suboptimal results (AUC, 0.61, 95% CI: 0.47─0.76). The performance improved after applying data augmentation methods in the training cohort (SDA, AUC: 0.89 [0.81─0.97]; NSA, AUC:0.90 [0.89─1.00]). Additionally, after applying fine-tuning methods, the performance further improved (SDA plus fine-tuning, AUC:0.95 [0.89─1.00]; NSA plus fine-tuning, AUC:0.95 [0.90─1.00]).

Conclusion: Our study revealed that deep learning models developed for eNose breathprint can achieve cross-site validation with data augmentation and fine-tuning. Accordingly, eNose breathprints emerge as a convenient, non-invasive, and potentially generalizable solution for lung cancer detection.

Clinical trial registration: This study is not a clinical trial and was therefore not registered.

Keywords: Breathprint; Cross-site validation; Data augmentation; Deep learning; Electronic nose; Lung cancer.

Publication types

Multicenter Study
Validation Study

MeSH terms

Adult
Aged
Breath Tests / methods
Deep Learning*
Electronic Nose*
Female
Humans
Lung Neoplasms* / diagnosis
Male
Middle Aged
Prospective Studies
Reproducibility of Results

Abstract

Publication types

MeSH terms

Grants and funding