Multiple Machine Learnings Revealed Similar Predictive Accuracy for Prognosis of PNETs from the Surveillance, Epidemiology, and End Result Database

Yiyan Song; Shaowei Gao; Wulin Tan; Zeting Qiu; Huaqiang Zhou; Yue Zhao

doi:10.7150/jca.26649

Multiple Machine Learnings Revealed Similar Predictive Accuracy for Prognosis of PNETs from the Surveillance, Epidemiology, and End Result Database

J Cancer. 2018 Oct 10;9(21):3971-3978. doi: 10.7150/jca.26649. eCollection 2018.

Authors

Yiyan Song^{1

2}, Shaowei Gao², Wulin Tan², Zeting Qiu³, Huaqiang Zhou³, Yue Zhao¹

Affiliations

¹ Department of General Surgery, Guangdong Second Provincial General Hospital, Guangzhou, China.
² Department of Anesthesia, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
³ Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.

Abstract

Background: Prognosis prediction is indispensable in clinical practice and machine learning has been proved to be helpful. We expected to predict survival of pancreatic neuroendocrine tumors (PNETs) with machine learning, and compared it with the American Joint Committee on Cancer (AJCC) staging system. Methods: Data of PNETs cases were extracted from The Surveillance, Epidemiology, and End Result (SEER) database. Statistic description, multivariate survival analysis and preprocessing were done before machine learning. Four different algorithms (logistic regression (LR), support vector machines (SVM), random forest (RF) and deep learning (DL)) were used to train the model. We used proper imputations to manage missing data in the database and sensitive analysis was performed to evaluate the imputation. The model with the best predictive accuracy was compared with the AJCC staging system using the SEER cases. Results: The four models had similar predictive accuracy with no significant difference existed (p = 0.664). The DL model showed a slightly better predictive accuracy than others (81.6% (± 1.9%)), thus it was used for further comparison with the AJCC staging system and revealed a better performance for PNETs cases in SEER database (Area under receiver operating characteristic curve: 0.87 vs 0.76). The validity of missing data imputation was supported by sensitivity analysis. Conclusions: The models developed with machine learning performed well in survival prediction of PNETs, and the DL model have a better accuracy and specificity than the AJCC staging system in SEER data. The DL model has potential for clinical application but external validation is needed.

Keywords: SEER database; machine learning; pancreatic neuroendocrine tumor; prognostic prediction.