Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder

Nat Commun. 2020 Feb 12;11(1):856. doi: 10.1038/s41467-020-14666-6.

Abstract

Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, that commonly define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without prior knowledge of a biological network, instead training a deep autoencoder from large transcriptional data. We hypothesize that modules could be discovered within the autoencoder representations. We find a statistically significant enrichment of genome-wide association studies (GWAS) relevant genes in the last layer, and to a successively lesser degree in the middle and first layers respectively. In contrast, we find an opposite gradient where a modular protein-protein interaction signal is strongest in the first layer, but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach is sufficient to discover groups of disease-related genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Disease / genetics*
  • Gene Expression Profiling
  • Gene Regulatory Networks
  • Genetics
  • Genome-Wide Association Study*
  • Humans
  • Protein Interaction Maps / genetics*
  • Systems Biology