Machine learning application in personalised lung cancer recurrence and survivability prediction

Comput Struct Biotechnol J. 2022 Apr 4:20:1811-1820. doi: 10.1016/j.csbj.2022.03.035. eCollection 2022.

Abstract

Machine learning is an important artificial intelligence technique that is widely applied in cancer diagnosis and detection. More recently, with the rise of personalised and precision medicine, there is a growing trend towards machine learning applications for prognosis prediction. However, to date, building reliable prediction models of cancer outcomes in everyday clinical practice is still a hurdle. In this work, we integrate genomic, clinical and demographic data of lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) patients from The Cancer Genome Atlas (TCGA) and introduce copy number variation (CNV) and mutation information of 15 selected genes to generate predictive models for recurrence and survivability. We compare the accuracy and benefits of three well-established machine learning algorithms: decision tree methods, neural networks and support vector machines. Although the accuracy of predictive models using the decision tree method has no significant advantage, the tree models reveal the most important predictors among genomic information (e.g. KRAS, EGFR, TP53), clinical status (e.g. TNM stage and radiotherapy) and demographics (e.g. age and gender) and how they influence the prediction of recurrence and survivability for both early stage LUAD and LUSC. The machine learning models have the potential to help clinicians to make personalised decisions on aspects such as follow-up timeline and to assist with personalised planning of future social care needs.

Keywords: ANNs, artificial neural networks; ANOVA, analysis of variance; AUC, the area under the ROC curve; CART, classification and regression tree; CNV, copy number variation; DTs, decision trees; Decision tree; FFNN, Feedforward neural networks; LS-SVM, least-squares support vector machine; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; Lung cancer; ML, machine learning; Machine learning; NSCLC, non-small cell lung cancer; Personalized diagnosis and prognosis; ROC, receiver operating characteristic; SVMs, support vector machines; TCGA, The Cancer Genome Atlas; TNM, a common cancer staging system while T, N and M refers to tumour, node and metastasis.