A rapid method for computationally inferring transcriptome coverage and microarray sensitivity

Bioinformatics. 2005 Jan 1;21(1):80-9. doi: 10.1093/bioinformatics/bth472. Epub 2004 Aug 12.

Abstract

Motivation: There are many different gene expression technologies, including cDNA and oligo-based microarrays, SAGE and MPSS. For each organism of interest, coverage of the transcriptome and the genome will be different. We address the question of what level of coverage is required to exploit the sensitivity of the different technologies, and what is the sensitivity of the different approaches in the experimental study.

Results: We estimate the transcriptome coverage by randomly sampling transcripts from a pre-defined tag-to-gene mapping function. For a given microarray experiment, we locate the thresholds in intensities that define the distribution of transcript abundance. These values are compared against the distribution obtained by applying the same thresholds to the intensities from differentially expressed genes. The ratio of these two distributions meets at the equilibrium defining sensitivity. We conclude that a collection of approximately 340,000 sequences is adequate for microarrays, but not large enough for maximum utilization of tag-based technologies. In the absence of large-scale sequencing, the majority of the tags detected by the latter approaches will remain unidentified until the genome sequence is available.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Chromosome Mapping / methods*
  • Expressed Sequence Tags
  • Gene Expression Profiling / methods*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Proteome / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*
  • Transcription Factors / genetics*

Substances

  • Proteome
  • Transcription Factors