Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1347-52. doi: 10.1073/pnas.1118018109. Epub 2012 Jan 9.

Abstract

RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Barcoding, Taxonomic / methods*
  • DNA, Complementary / genetics
  • Escherichia coli / genetics
  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Polymerase Chain Reaction / methods
  • Sequence Analysis, RNA / methods*
  • Systems Biology / methods*

Substances

  • DNA, Complementary

Associated data

  • GEO/GSE34449