Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics

Mol Biol Evol. 2009 Dec;26(12):2731-44. doi: 10.1093/molbev/msp188. Epub 2009 Aug 25.

Abstract

Next-generation sequencing has opened the door to genomic analysis of nonmodel organisms. Technologies generating long-sequence reads (200-400 bp) are increasingly used in evolutionary studies of nonmodel organisms, but the short-sequence reads (30-50 bp) that can be produced at lower cost are thought to be of limited utility for de novo sequencing applications. Here, we tested this assumption by short-read sequencing the transcriptomes of the tropical disease vectors Aedes aegypti and Anopheles gambiae, for which complete genome sequences are available. Comparison of our results to the reference genomes allowed us to accurately evaluate the quantity, quality, and functional and evolutionary information content of our "test" data. We produced more than 0.7 billion nucleotides of sequenced data per species that assembled into more than 21,000 test contigs larger than 100 bp per species and covered approximately 27% of the Aedes reference transcriptome. Remarkably, the substitution error rate in the test contigs was approximately 0.25% per site, with very few indels or assembly errors. Test contigs of both species were enriched for genes involved in energy production and protein synthesis and underrepresented in genes involved in transcription and differentiation. Ortholog prediction using the test contigs was accurate across hundreds of millions of years of evolution. Our results demonstrate the considerable utility of short-read transcriptome sequencing for genomic studies of nonmodel organisms and suggest an approach for assessing the information content of next-generation data for evolutionary studies.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Aedes / genetics*
  • Alternative Splicing / genetics
  • Amino Acid Substitution / genetics
  • Animals
  • Anopheles / genetics*
  • Contig Mapping
  • Evolution, Molecular*
  • Female
  • Gene Expression Profiling / methods*
  • Genomics / methods*
  • INDEL Mutation / genetics
  • Male
  • Oligonucleotide Array Sequence Analysis
  • Proteome / genetics
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Reference Standards
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid

Substances

  • Proteome
  • RNA, Messenger