Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses

RNA. 2013 Apr;19(4):479-89. doi: 10.1261/rna.037473.112. Epub 2013 Feb 19.

Abstract

Human gene annotation is crucial for conducting transcriptomic and genetic studies; however, the impacts of human gene annotations in diverse databases on related studies have been less evaluated. To enable full use of various human annotation resources and better understand the human transcriptome, here we systematically compare the human annotations present in RefSeq, Ensembl (GENCODE), and AceView on diverse transcriptomic and genetic analyses. We found that the human gene annotations in the three databases are far from complete. Although Ensembl and AceView annotated more genes than RefSeq, more than 15,800 genes from Ensembl (or AceView) are within the intergenic and intronic regions of AceView (or Ensembl) annotation. The human transcriptome annotations in RefSeq, Ensembl, and AceView had distinct effects on short-read mapping, gene and isoform expression profiling, and differential expression calling. Furthermore, our findings indicate that the integrated annotation of these databases can obtain a more complete gene set and significantly enhance those transcriptomic analyses. We also observed that many more known SNPs were located within genes annotated in Ensembl and AceView than in RefSeq. In particular, 1033 of 3041 trait/disease-associated SNPs involved in about 200 human traits/diseases that were previously reported to be in RefSeq intergenic regions could be relocated within Ensembl and AceView genes. Our findings illustrate that a more complete transcriptome generated by incorporating human gene annotations in diverse databases can strikingly improve the overall results of transcriptomic and genetic studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line
  • Chromosomes, Human
  • Databases, Genetic*
  • Disease / genetics
  • Gene Expression Profiling
  • Genome, Human*
  • Humans
  • Molecular Sequence Annotation*
  • Organ Specificity
  • Polymorphism, Single Nucleotide
  • Reference Standards
  • Transcriptome*