Positional candidate gene selection from livestock EST databases using Gene Ontology

Bioinformatics. 2003 Jan 22;19(2):249-55. doi: 10.1093/bioinformatics/19.2.249.

Abstract

Motivation: The number of expressed sequence tags (ESTs) in GenBank has now surpassed 200,000 for cattle and 100,000 for swine. The Institute of Genome Research (TIGR) has organized these sequences into approximately 60,000 non-redundant consensus sequences (identified by TIGR Gene Indices) for cattle and 40,000 for swine. Anonymous ESTs are of limited value unless they are connected to function. Functional information is difficult to manage electronically because of heterogeneity of meaning and form among databases. The Gene Ontology (GO) Consortium has produced ontologies for gene function with consistent meaning and form across species. Linking livestock EST to gene function through similarity with sequences from other annotation-rich mammals could accelerate: (1) the discovery of positional candidate genes underlying a livestock quantitative trait locus (QTL) and (2) comparative mapping between livestock and other mammals (e.g. humans, mouse and rat). We initiated this investigation to determine if incorporation of the GO into the annotation process could accelerate livestock positional candidate gene discovery.

Results: We have associated livestock ESTs with GO nodes through sequence similarity to the NCBI Reference Sequences (RefSeq). Positional candidate genes are identified within minutes that otherwise required days. The schema described here accommodates queries that return GO nodes from terms familiar to biologists, such as gene name, alternate/alias symbol, and OMIM phenotype.

Availability: Scripts and schema are available on request from the authors.

Publication types

  • Evaluation Study
  • Validation Study

MeSH terms

  • Animals
  • Animals, Domestic / genetics*
  • Animals, Domestic / physiology
  • Database Management Systems*
  • Databases, Factual
  • Databases, Nucleic Acid*
  • Expressed Sequence Tags*
  • Gene Expression Profiling / methods
  • Information Storage and Retrieval / methods
  • Natural Language Processing
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods
  • Sequence Homology, Nucleic Acid
  • Species Specificity
  • Vocabulary, Controlled*