Gene coexpression measures in large heterogeneous samples using count statistics

Proc Natl Acad Sci U S A. 2014 Nov 18;111(46):16371-6. doi: 10.1073/pnas.1417128111. Epub 2014 Oct 6.

Abstract

With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

Keywords: Stein's approximation; bivariate association; local rank patterns; random permutation statistics.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Algorithms
  • Arabidopsis / genetics
  • Arabidopsis / metabolism
  • Arabidopsis Proteins / biosynthesis
  • Arabidopsis Proteins / genetics
  • Cell Cycle Proteins / biosynthesis
  • Cell Cycle Proteins / genetics
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data*
  • Computer Simulation
  • Gene Expression Profiling / statistics & numerical data*
  • Gene Expression Regulation, Fungal
  • Gene Expression Regulation, Plant
  • Gene Regulatory Networks*
  • Genes, Fungal
  • Genes, Plant
  • Models, Genetic
  • Monte Carlo Method
  • Saccharomyces cerevisiae / cytology
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae Proteins / biosynthesis
  • Saccharomyces cerevisiae Proteins / genetics
  • Time Factors

Substances

  • Arabidopsis Proteins
  • Cell Cycle Proteins
  • Saccharomyces cerevisiae Proteins