Provenance for distributed biomedical workflow execution

Stud Health Technol Inform. 2012:175:91-100.

Abstract

Scientific research has become very data and compute intensive because of the progress in data acquisition and measurement devices, which is particularly true in Life Sciences. To cope with this deluge of data, scientists use distributed computing and storage infrastructures. The use of such infrastructures introduces by itself new challenges to the scientists in terms of proper and efficient use. Scientific workflow management systems play an important role in facilitating the use of the infrastructure by hiding some of its complexity. Although most scientific workflow management systems are provenance-aware, not all of them come with provenance functionality out of the box. In this paper we describe the improvement and integration of a provenance system into an e-infrastructure for biomedical research based on the MOTEUR workflow management system. The main contributions of the paper are: presenting an OPM implementation using relational database backend for the provenance store, providing an e-infrastructure with a comprehensive provenance system, defining a generic approach to provenance implementation, potentially suitable for other workflow systems and application domains and demonstrating the value of this system based on use cases presenting the provenance data through a user-friendly web interface.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biological Science Disciplines*
  • Biomedical Research / methods*
  • Internet*
  • Software*
  • Workflow*