A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data

Bioinformatics. 2007 Nov 15;23(22):3080-7. doi: 10.1093/bioinformatics/btm461. Epub 2007 Sep 19.

Abstract

Motivation: The numerous public data resources make integrative bioinformatics experimentation increasingly important in life sciences research. However, it is severely hampered by the way the data and information are made available. The semantic web approach enhances data exchange and integration by providing standardized formats such as RDF, RDF Schema (RDFS) and OWL, to achieve a formalized computational environment. Our semantic web-enabled data integration (SWEDI) approach aims to formalize biological domains by capturing the knowledge in semantic models using ontologies as controlled vocabularies. The strategy is to build a collection of relatively small but specific knowledge and data models, which together form a 'personal semantic framework'. This can be linked to external large, general knowledge and data models. In this way, the involved scientists are familiar with the concepts and associated relationships in their models and can create semantic queries using their own terms. We studied the applicability of our SWEDI approach in the context of a biological use case by integrating genomics data sets for histone modification and transcription factor binding sites.

Results: We constructed four OWL knowledge models, two RDFS data models, transformed and mapped relevant data to the data models, linked the data models to knowledge models using linkage statements, and ran semantic queries. Our biological use case demonstrates the relevance of these kinds of integrative bioinformatics experiments. Our findings show high startup costs for the SWEDI approach, but straightforward extension with similar data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Computational Biology / methods*
  • Database Management Systems*
  • Databases, Genetic*
  • Genomics / methods*
  • Information Storage and Retrieval / methods
  • Internet*
  • Natural Language Processing*
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / metabolism
  • Research Design
  • Systems Integration

Substances

  • Proteins