Effective Cancer Subtype and Stage Prediction via Dropfeature-DNNs

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):107-120. doi: 10.1109/TCBB.2021.3058941. Epub 2022 Feb 3.

Abstract

Precise cancer subtype and/or stage prediction is instrumental for cancer diagnosis, treatment and management. However, most of the existing methods based on genomic profiles suffer from issues such as overfitting, high computational complexity and selected features (i.e., genes) not directly related to forecast precision. These deficiencies are largely due to the nature of "high dimensionality and small sample size" inherent in molecular data, and such a nature is often deemed as an obstacle to the application of deep learning, e.g., deep neural networks (DNNs), to biomedicine and cancer research. In this paper, we propose a DNN-based algorithm coupled with a new embedded feature selection technique, named Dropfeature-DNNs, to address these issues. Dropfeature-DNNs can discard some irrelevant features (i.e., genes) when training DNNs, and we formulate Dropfeature-DNNs as an iterative AUC optimization problem. As such, an "optimal" feature subset that contains meaningful genes for accurate tumor subtype and/or stage prediction can be obtained when the AUC optimization converges in the training stage. Since the feature subset and AUC optimizations are synchronous with the training phase of DNNs, model complexity and computational cost are simultaneously reduced. Rigorous feature subset convergence analysis and error bound inference provide a solid theoretical foundation for the proposed method. Extensive empirical comparisons to benchmark methods further demonstrate the efficacy of Dropfeature-DNNs in cancer subtype and/or stage prediction using HDSS gene expression data from multiple cancer types.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Humans
  • Neoplasms* / genetics
  • Neural Networks, Computer*