Uniform approximation is more appropriate for Wilcoxon Rank-Sum Test in gene set analysis

PLoS One. 2012;7(2):e31505. doi: 10.1371/journal.pone.0031505. Epub 2012 Feb 7.

Abstract

Gene set analysis is widely used to facilitate biological interpretations in the analyses of differential expression from high throughput profiling data. Wilcoxon Rank-Sum (WRS) test is one of the commonly used methods in gene set enrichment analysis. It compares the ranks of genes in a gene set against those of genes outside the gene set. This method is easy to implement and it eliminates the dichotomization of genes into significant and non-significant in a competitive hypothesis testing. Due to the large number of genes being examined, it is impractical to calculate the exact null distribution for the WRS test. Therefore, the normal distribution is commonly used as an approximation. However, as we demonstrate in this paper, the normal approximation is problematic when a gene set with relative small number of genes is tested against the large number of genes in the complementary set. In this situation, a uniform approximation is substantially more powerful, more accurate, and less intensive in computation. We demonstrate the advantage of the uniform approximations in Gene Ontology (GO) term analysis using simulations and real data sets.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / standards
  • Research Design
  • Sample Size
  • Statistics, Nonparametric*