Identifying pathogenic processes by integrating microarray data with prior knowledge

BMC Bioinformatics. 2014 Apr 24:15:115. doi: 10.1186/1471-2105-15-115.

Abstract

Background: It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping.

Results: Simulation results showed that the method improved the ability of identifying correct groups compared to traditional clustering, especially for small sample sizes. Applied to a microarray heart failure dataset the method found one large cluster with several genes important for the structure of the extracellular matrix and a smaller group with many genes involved in carbohydrate metabolism. The method was also applied to a microarray dataset on melanoma cancer patients with or without metastasis, where the main cluster was dominated by genes related to keratinocyte differentiation.

Conclusion: Our method found clusters overlapping with known pathogenic processes, but also pointed to new connections extending beyond the classical pathways.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Bayes Theorem
  • Cluster Analysis
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks
  • Heart Failure / genetics
  • Heart Failure / metabolism
  • Humans
  • Markov Chains
  • Melanoma / genetics
  • Melanoma / metabolism
  • Mice
  • Monte Carlo Method
  • Oligonucleotide Array Sequence Analysis / methods*
  • Protein Interaction Mapping
  • Sequence Homology, Amino Acid
  • Transcription Factors / metabolism

Substances

  • Transcription Factors