Gene identification and expression analysis of 86,136 Expressed Sequence Tags (EST) from the rice genome

Genomics Proteomics Bioinformatics. 2003 Feb;1(1):26-42. doi: 10.1016/s1672-0229(03)01005-2.

Abstract

Expressed Sequence Tag (EST) analysis has pioneered genome-wide gene discovery and expression profiling. In order to establish a gene expression index in the rice cultivar indica, we sequenced and analyzed 86,136 ESTs from nine rice cDNA libraries from the super hybrid cultivar LYP9 and its parental cultivars. We assembled these ESTs into 13,232 contigs and leave 8,976 singletons. Overall, 7,497 sequences were found similar to existing sequences in GenBank and 14,711 are novel. These sequences are classified by molecular function, biological process and pathways according to the Gene Ontology. We compared our sequenced ESTs with the publicly available 95,000 ESTs from japonica, and found little sequence variation, despite the large difference between genome sequences. We then assembled the combined 173,000 rice ESTs for further analysis. Using the pooled ESTs, we compared gene expression in metabolism pathway between rice and Arabidopsis according to KEGG. We further profiled gene expression patterns in different tissues, developmental stages, and in a conditional sterile mutant, after checking the libraries are comparable by means of sequence coverage. We also identified some possible library specific genes and a number of enzymes and transcription factors that contribute to rice development.

MeSH terms

  • Arabidopsis / genetics
  • DNA, Complementary / metabolism
  • Databases as Topic
  • Expressed Sequence Tags*
  • Gene Library
  • Genome, Plant*
  • Genomics / methods*
  • Multigene Family
  • Open Reading Frames
  • Oryza / genetics*
  • Quality Control
  • Software

Substances

  • DNA, Complementary