A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data

Nucleic Acids Res. 2010 Jan;38(3):e17. doi: 10.1093/nar/gkp942. Epub 2009 Nov 18.

Abstract

Illumina BeadArrays are among the most popular and reliable platforms for gene expression profiling. However, little external scrutiny has been given to the design, selection and annotation of BeadArray probes, which is a fundamental issue in data quality and interpretation. Here we present a pipeline for the complete genomic and transcriptomic re-annotation of Illumina probe sequences, also applicable to other platforms, with its output available through a Web interface and incorporated into Bioconductor packages. We have identified several problems with the design of individual probes and we show the benefits of probe re-annotation on the analysis of BeadArray gene expression data sets. We discuss the importance of aspects such as probe coverage of individual transcripts, alternative messenger RNA splicing, single-nucleotide polymorphisms, repeat sequences, RNA degradation biases and probes targeting genomic regions with no known transcription. We conclude that many of the Illumina probes have unreliable original annotation and that our re-annotation allows analyses to focus on the good quality probes, which form the majority, and also to expand the scope of biological information that can be extracted.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing
  • Base Pair Mismatch
  • Gene Expression Profiling / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Probes / chemistry*
  • Polymorphism, Single Nucleotide
  • Repetitive Sequences, Nucleic Acid
  • Software

Substances

  • Oligonucleotide Probes