Multi-Approach Bioinformatics Analysis of Curated Omics Data Provides a Gene Expression Panorama for Multiple Cancer Types

Front Genet. 2020 Nov 23:11:586602. doi: 10.3389/fgene.2020.586602. eCollection 2020.

Abstract

Studies describing the expression patterns and biomarkers for the tumoral process increase in number every year. The availability of new datasets, although essential, also creates a confusing landscape where common or critical mechanisms are obscured amidst the divergent and heterogeneous nature of such results. In this work, we manually curated the Gene Expression Omnibus using rigorous filtering criteria to select the most homogeneous and highest quality microarray and RNA-seq datasets from multiple types of cancer. By applying systems biology approaches, combined with machine learning analysis, we investigated possible frequently deregulated molecular mechanisms underlying the tumoral process. Our multi-approach analysis of 99 curated datasets, composed of 5,406 samples, revealed 47 differentially expressed genes in all analyzed cancer types, which were all in agreement with the validation using TCGA data. Results suggest that the tumoral process is more related to the overexpression of core deregulated machinery than the underexpression of a given gene set. Additionally, we identified gene expression similarities between different cancer types not described before and performed an overall survival analysis using 20 cancer types. Finally, we were able to suggest a core regulatory mechanism that could be frequently deregulated.

Keywords: bioinformatics; cancer; machine learning; omics; overall survival; regulatory networks; systems biology.