A statistical quality assessment method for longitudinal observations in electronic health record data with an application to the VA million veteran program

BMC Med Inform Decis Mak. 2021 Oct 20;21(1):289. doi: 10.1186/s12911-021-01643-2.

Abstract

Background: To describe an automated method for assessment of the plausibility of continuous variables collected in the electronic health record (EHR) data for real world evidence research use.

Methods: The most widely used approach in quality assessment (QA) for continuous variables is to detect the implausible numbers using prespecified thresholds. In augmentation to the thresholding method, we developed a score-based method that leverages the longitudinal characteristics of EHR data for detection of the observations inconsistent with the history of a patient. The method was applied to the height and weight data in the EHR from the Million Veteran Program Data from the Veteran's Healthcare Administration (VHA). A validation study was also conducted.

Results: The receiver operating characteristic (ROC) metrics of the developed method outperforms the widely used thresholding method. It is also demonstrated that different quality assessment methods have a non-ignorable impact on the body mass index (BMI) classification calculated from height and weight data in the VHA's database.

Conclusions: The score-based method enables automated and scaled detection of the problematic data points in health care big data while allowing the investigators to select the high-quality data based on their need. Leveraging the longitudinal characteristics in EHR will significantly improve the QA performance.

Keywords: Clinical informatics; Data quality assessment (DQA); Electronic health record (EHR); Health care big data; Real world evidence; Vital signs.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Big Data
  • Data Accuracy
  • Data Management
  • Electronic Health Records*
  • Humans
  • Veterans*