Gene shaving using a sensitivity analysis of kernel based machine learning approach, with applications to cancer data

PLoS One. 2019 May 23;14(5):e0217027. doi: 10.1371/journal.pone.0217027. eCollection 2019.

Abstract

Background: Gene shaving (GS) is an essential and challenging tools for biomedical researchers due to the large number of genes in human genome and the complex nature of biological networks. Most GS methods are not applicable to non-linear and multi-view data sets. While the kernel based methods can overcome these problems, a well-founded positive definite kernel based GS method has yet to be proposed for biomedical data analysis.

Methods and findings: Since the kernel based methods on genomic information can improve the prediction of diseases, here we proposed a noble method, "kernel based gene shaving" which is based on the influence function of kernel canonical correlation analysis. To investigate the performance of the proposed method in comparison to state-of-the-art-method in gene saving, we analyzed extensive simulated and real microarray gene expression data set. The performance metrics including true positive rate, true negative rate, false positive rate, false negative rate, misclassification error rate, the false discovery rate and area under curves were computed for each methods. In colon cancer data analysis, the proposed method identified a significant subsets of 210 genes out of 2000 genes and suggestive superior performance compared with other methods. The proposed method can be applied to the study of other disease process where two view data is a common task.

Conclusions: We addressed the challenge of finding unique kernel based GS methods by using the influence function of kernel canonical correlation analysis. The proposed method has shown to have better performance than state-of-the-art-methods in gene saving and has identified many more significant gene interactions, suggesting that genes function in a concerted effort in colon cancer. In similar biomedical data analysis, kernel based methods could be applied to select a potential subset of genes. The positive definite kernel based methods can overcome the non-linearity problem and improve the prediction process.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • Artificial Intelligence
  • Colonic Neoplasms / diagnosis*
  • Colonic Neoplasms / genetics*
  • Computational Biology
  • Computer Simulation
  • False Positive Reactions
  • Gene Expression Profiling
  • Genetic Techniques*
  • Humans
  • Machine Learning*
  • Nonlinear Dynamics
  • Oligonucleotide Array Sequence Analysis
  • Software