Identification of OBO nonalignments and its implications for OBO enrichment

Bioinformatics. 2008 Jun 15;24(12):1448-55. doi: 10.1093/bioinformatics/btn194. Epub 2008 May 7.

Abstract

Motivation: Existing projects that focus on the semiautomatic addition of links between existing terms in the Open Biomedical Ontologies can take advantage of reasoners that can make new inferences between terms that are based on the added formal definitions and that reflect nonalignments between the linked terms. However, these projects require that these definitions be necessary and sufficient, a strong requirement that often does not hold. If such definitions cannot be added, the reasoners cannot point to the nonalignments through the suggestion of new inferences.

Results: We describe a methodology by which we have identified over 1900 instances of nonredundant nonalignments between terms from the Gene Ontology (GO) biological process (BP), cellular component (CC) and molecular function (MF) ontologies, Chemical Entities of Biological Interest (ChEBI) and the Cell Type Ontology (CL). Many of the 39.8% of these nonalignments whose object terms are more atomic than the subject terms are not currently examined in other ontology-enrichment projects due to the fact that the necessary and sufficient conditions required for the inferences are not currently examined. Analysis of the ratios of nonalignments to assertions from which the nonalignments were identified suggests that BP-MF, BP-BP, BP-CL and CC-CC terms are relatively well-aligned, while ChEBI-MF, BP-ChEBI and CC-MF terms are relatively not aligned well. We propose four ways to resolve an identified nonalignment and recommend an analogous implementation of our methodology in ontology-enrichment tools to identify types of nonalignments that are currently not detected.

Availability: The nonalignments discussed in this article may be viewed at http://compbio.uchsc.edu/Hunter_lab/Bada/nonalignments_2008_03_06.html. Code for the generation of these nonalignments is available upon request.

Contact: mike.bada@uchsc.edu.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Database Management Systems*
  • Databases, Genetic*
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Systems Integration
  • Vocabulary, Controlled*