Identification and prevention of a GC content bias in SAGE libraries

Nucleic Acids Res. 2001 Jun 15;29(12):E60-0. doi: 10.1093/nar/29.12.e60.

Abstract

Serial Analysis of Gene Expression (SAGE) is becoming a widely used gene expression profiling method for the study of development, cancer and other human diseases. Investigators using SAGE rely heavily on the quantitative aspect of this method for cataloging gene expression and comparing multiple SAGE libraries. We have developed additional computational and statistical tools to assess the quality and reproducibility of a SAGE library. Using these methods, a critical variable in the SAGE protocol was identified that has the potential to bias the Tag distribution relative to the GC content of the 10 bp SAGE Tag DNA sequence. We also detected this bias in a number of publicly available SAGE libraries. It is important to note that the GC content bias went undetected by quality control procedures in the current SAGE protocol and was only identified with the use of these statistical analyses on as few as 750 SAGE Tags. In addition to keeping any solution of free DiTags on ice, an analysis of the GC content should be performed before sequencing large numbers of SAGE Tags to be confident that SAGE libraries are free from experimental bias.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Base Composition*
  • Bias
  • Brain / metabolism
  • Extremities / embryology
  • Gene Expression Profiling / methods*
  • Gene Library*
  • Male
  • Mice
  • Monte Carlo Method
  • Quality Control
  • Reproducibility of Results
  • Temperature