A distribution free summarization method for Affymetrix GeneChip arrays

Bioinformatics. 2007 Feb 1;23(3):321-7. doi: 10.1093/bioinformatics/btl609. Epub 2006 Dec 5.

Abstract

Motivation: Affymetrix GeneChip arrays require summarization in order to combine the probe-level intensities into one value representing the expression level of a gene. However, probe intensity measurements are expected to be affected by different levels of non-specific- and cross-hybridization to non-specific transcripts. Here, we present a new summarization technique, the Distribution Free Weighted method (DFW), which uses information about the variability in probe behavior to estimate the extent of non-specific and cross-hybridization for each probe. The contribution of the probe is weighted accordingly during summarization, without making any distributional assumptions for the probe-level data.

Results: We compare DFW with several popular summarization methods on spike-in datasets, via both our own calculations and the 'Affycomp II' competition. The results show that DFW outperforms other methods when sensitivity and specificity are considered simultaneously. With the Affycomp spike-in datasets, the area under the receiver operating characteristic curve for DFW is nearly 1.0 (a perfect value), indicating that DFW can identify all differentially expressed genes with a few false positives. The approach used is also computationally faster than most other methods in current use.

Availability: The R code for DFW is available upon request.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Gene Expression Profiling / instrumentation*
  • Gene Expression Profiling / methods*
  • Models, Genetic
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / instrumentation*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Statistical Distributions