Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release

Brian J Haas; Jennifer R Wortman; Catherine M Ronning; Linda I Hannick; Roger K Smith Jr; Rama Maiti; Agnes P Chan; Chunhui Yu; Maryam Farzad; Dongying Wu; Owen White; Christopher D Town

doi:10.1186/1741-7007-3-7

Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release

BMC Biol. 2005 Mar 22:3:7. doi: 10.1186/1741-7007-3-7.

Authors

Brian J Haas¹, Jennifer R Wortman, Catherine M Ronning, Linda I Hannick, Roger K Smith Jr, Rama Maiti, Agnes P Chan, Chunhui Yu, Maryam Farzad, Dongying Wu, Owen White, Christopher D Town

Affiliation

¹ The Institute for Genomic Research, 9172 Medical Center Drive, Rockville, Maryland 20850, USA. bhaas@tigr.org

Abstract

Background: Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications.

Results: Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5).

Conclusion: Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Alternative Splicing / genetics
Arabidopsis / classification*
Arabidopsis / genetics*
Computational Biology / methods*
Computational Biology / standards
Genome, Plant / genetics*
Models, Genetic
Plant Proteins / classification
Plant Proteins / genetics
Sequence Analysis, Protein / methods*
Writing*

Substances

Plant Proteins