A statistical quality assessment method for longitudinal observations in electronic health record data with an application to the VA million veteran program

Hui Wang; Ilana Belitskaya-Levy; Fan Wu; Jennifer S Lee; Mei-Chiung Shih; Philip S Tsao; Ying Lu; VA Million Veteran Program

doi:10.1186/s12911-021-01643-2

A statistical quality assessment method for longitudinal observations in electronic health record data with an application to the VA million veteran program

BMC Med Inform Decis Mak. 2021 Oct 20;21(1):289. doi: 10.1186/s12911-021-01643-2.

Authors

Hui Wang¹, Ilana Belitskaya-Levy¹, Fan Wu¹, Jennifer S Lee^{1

2

3}, Mei-Chiung Shih^{1

4}, Philip S Tsao^{1

2}, Ying Lu^{5

6

7}; VA Million Veteran Program

Affiliations

¹ Department of Veterans Affairs, Cooperative Studies Program Palo Alto Coordinating Center, 701B North Shoreline Blvd, Mountain View, CA, 94043, USA.
² Department of Medicine, Stanford University School of Medicine, 1265 Welch Road, Stanford, CA, 94305-5464, USA.
³ Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, 94305, USA.
⁴ Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road, X359, Stanford, CA, 94305-5464, USA.
⁵ Department of Veterans Affairs, Cooperative Studies Program Palo Alto Coordinating Center, 701B North Shoreline Blvd, Mountain View, CA, 94043, USA. ylu1@stanford.edu.
⁶ Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, 94305, USA. ylu1@stanford.edu.
⁷ Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road, X359, Stanford, CA, 94305-5464, USA. ylu1@stanford.edu.

Abstract

Background: To describe an automated method for assessment of the plausibility of continuous variables collected in the electronic health record (EHR) data for real world evidence research use.

Methods: The most widely used approach in quality assessment (QA) for continuous variables is to detect the implausible numbers using prespecified thresholds. In augmentation to the thresholding method, we developed a score-based method that leverages the longitudinal characteristics of EHR data for detection of the observations inconsistent with the history of a patient. The method was applied to the height and weight data in the EHR from the Million Veteran Program Data from the Veteran's Healthcare Administration (VHA). A validation study was also conducted.

Results: The receiver operating characteristic (ROC) metrics of the developed method outperforms the widely used thresholding method. It is also demonstrated that different quality assessment methods have a non-ignorable impact on the body mass index (BMI) classification calculated from height and weight data in the VHA's database.

Conclusions: The score-based method enables automated and scaled detection of the problematic data points in health care big data while allowing the investigators to select the high-quality data based on their need. Leveraging the longitudinal characteristics in EHR will significantly improve the QA performance.

Keywords: Clinical informatics; Data quality assessment (DQA); Electronic health record (EHR); Health care big data; Real world evidence; Vital signs.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Big Data
Data Accuracy
Data Management
Electronic Health Records*
Humans
Veterans*

Abstract

Publication types

MeSH terms

Grants and funding