Large-scale collection and annotation of gene models for date palm (Phoenix dactylifera, L.)

Plant Mol Biol. 2012 Aug;79(6):521-36. doi: 10.1007/s11103-012-9924-z. Epub 2012 Jun 27.

Abstract

The date palm (Phoenix dactylifera L.), famed for its sugar-rich fruits (dates) and cultivated by humans since 4,000 B.C., is an economically important crop in the Middle East, Northern Africa, and increasingly other places where climates are suitable. Despite a long history of human cultivation, the understanding of P. dactylifera genetics and molecular biology are rather limited, hindered by lack of basic data in high quality from genomics and transcriptomics. Here we report a large-scale effort in generating gene models (assembled expressed sequence tags or ESTs and mapped to a genome assembly) for P. dactylifera, using the long-read pyrosequencing platform (Roche/454 GS FLX Titanium) in high coverage. We built fourteen cDNA libraries from different P. dactylifera tissues (cultivar Khalas) and acquired 15,778,993 raw sequencing reads-about one million sequencing reads per library-and the pooled sequences were assembled into 67,651 non-redundant contigs and 301,978 singletons. We annotated 52,725 contigs based on the plant databases and 45 contigs based on functional domains referencing to the Pfam database. From the annotated contigs, we assigned GO (Gene Ontology) terms to 36,086 contigs and KEGG pathways to 7,032 contigs. Our comparative analysis showed that 70.6 % (47,930), 69.4 % (47,089), 68.4 % (46,441), and 69.3 % (47,048) of the P. dactylifera gene models are shared with rice, sorghum, Arabidopsis, and grapevine, respectively. We also assigned our gene models into house-keeping and tissue-specific genes based on their tissue specificity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arecaceae / genetics*
  • DNA, Complementary / genetics
  • DNA, Plant / genetics
  • Databases, Genetic
  • Expressed Sequence Tags
  • Flowers / genetics
  • Fruit / genetics
  • Gene Expression Profiling
  • Gene Expression Regulation, Plant
  • Genome, Plant*
  • Genomics / methods
  • Metabolic Networks and Pathways / genetics
  • Models, Genetic
  • Plant Leaves / genetics
  • Plant Proteins / genetics
  • Plant Proteins / metabolism
  • Plant Roots / genetics
  • RNA, Plant / genetics*

Substances

  • DNA, Complementary
  • DNA, Plant
  • Plant Proteins
  • RNA, Plant