Inconsistency in UK Biobank Event Definitions From Different Data Sources and Its Impact on Bias and Generalizability: A Case Study of Venous Thromboembolism

Am J Epidemiol. 2024 May 7;193(5):787-797. doi: 10.1093/aje/kwad232.

Abstract

The UK Biobank study contains several sources of diagnostic data, including hospital inpatient data and data on self-reported conditions for approximately 500,000 participants and primary-care data for approximately 177,000 participants (35%). Epidemiologic investigations require a primary disease definition, but whether to combine data sources to maximize statistical power or focus on only 1 source to ensure a consistent outcome is not clear. The consistency of disease definitions was investigated for venous thromboembolism (VTE) by evaluating overlap when defining cases from 3 sources: hospital inpatient data, primary-care reports, and self-reported questionnaires. VTE cases showed little overlap between data sources, with only 6% of reported events for persons with primary-care data being identified by all 3 sources (hospital, primary-care, and self-reports), while 71% appeared in only 1 source. Deep vein thrombosis-only events represented 68% of self-reported VTE cases and 36% of hospital-reported VTE cases, while pulmonary embolism-only events represented 20% of self-reported VTE cases and 50% of hospital-reported VTE cases. Additionally, different distributions of sociodemographic characteristics were observed; for example, patients in 46% of hospital-reported VTE cases were female, compared with 58% of self-reported VTE cases. These results illustrate how seemingly neutral decisions taken to improve data quality can affect the representativeness of a data set.

Keywords: UK Biobank; bias; deep vein thrombosis; event definition; generalizability; pulmonary embolism; representativeness; sociodemographic characteristics; venous thromboembolism.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Bias
  • Biological Specimen Banks
  • Female
  • Humans
  • Information Sources
  • Male
  • Middle Aged
  • Primary Health Care / statistics & numerical data
  • Pulmonary Embolism / epidemiology
  • Self Report*
  • UK Biobank
  • United Kingdom / epidemiology
  • Venous Thromboembolism* / epidemiology
  • Venous Thrombosis / epidemiology