A Data Quality Ontology for the Secondary Use of EHR Data

AMIA Annu Symp Proc. 2015 Nov 5:2015:1937-46. eCollection 2015.

Abstract

The secondary use of EHR data for research is expected to improve health outcomes for patients, but the benefits will only be realized if the data in the EHR is of sufficient quality to support these uses. A data quality (DQ) ontology was developed to rigorously define concepts and enable automated computation of data quality measures. The healthcare data quality literature was mined for the important terms used to describe data quality concepts and harmonized into an ontology. Four high-level data quality dimensions ("correctness", "consistency", "completeness" and "currency") categorize 19 lower level measures. The ontology serves as an unambiguous vocabulary, which defines concepts more precisely than natural language; it provides a mechanism to automatically compute data quality measures; and is reusable across domains and use cases. A detailed example is presented to demonstrate its utility. The DQ ontology can make data validation more common and reproducible.

MeSH terms

  • Biological Ontologies*
  • Data Accuracy*
  • Electronic Health Records*
  • Humans