A Comparison of Probabilistic and Deterministic Match Strategies for Linking Prehospital and in-Hospital Stroke Registry Data

J Stroke Cerebrovasc Dis. 2020 Oct;29(10):105151. doi: 10.1016/j.jstrokecerebrovasdis.2020.105151. Epub 2020 Jul 30.

Abstract

Background: Understanding and improving EMS stroke care requires linking data from both the prehospital and hospital settings. In the US, such data is collected in separate de-identified registries that cannot be directly linked due to lack of a common, unique patient identifier. In the absence of unique patient identifiers two common approaches to linking databases are deterministic matching, which uses combinations of non-unique matching variables to define matches, and probabilistic matching, which generates estimates of match probability based on the degree of similarity between records. This analysis seeks to compare these two approaches for matching EMS and stroke registry data.

Methods: Stroke cases transported by EMS to Michigan hospitals participating in the Michigan Coverdell Acute Stroke Registry were linked to records from Michigan's EMS Information System (MI-EMSIS) between January 2018 and June 2019. Destination hospital, date-of-service, patient age, date-of-birth, and sex were used to perform deterministic and probabilistic linkages. Match rates and representativeness of the matched samples were compared between the two matching strategies. Multivariable logistic regression was used to identify characteristics associated with successful matching.

Results: During the 18-month study period there were 8,828 EMS transported confirmed stroke cases in the registry and 620,907 EMS transports to 38 Coverdell registry-participating hospitals. The probabilistic match linked 5985 (67.7%) strokes to EMS records; the deterministic match linked 4012 (45.5%). Within each strategy the characteristics of matched and unmatched cases were similar, with the exception that deterministically matched cases were less likely to be older than 89 (adjusted odds ratio [aOR]=0.3), white (aOR=0.8), and more likely to have subarachnoid hemorrhage (aOR=1.4) than unmatched cases.

Conclusion: Probabilistic matching resulted in higher match rates and a more representative sample of EMS transported strokes, suggesting it may be superior in assessing EMS stroke care compared to a deterministic approach.

Keywords: Data linkage; Emergency Medical Services (EMS); Probabilistic matching; Quality improvement; Stroke.

Publication types

  • Comparative Study

MeSH terms

  • Aged
  • Aged, 80 and over
  • Ambulances / standards
  • Data Mining / methods*
  • Emergency Medical Services / standards*
  • Emergency Service, Hospital / standards*
  • Female
  • Humans
  • Male
  • Medical Record Linkage*
  • Michigan
  • Middle Aged
  • Probability
  • Quality Improvement / standards*
  • Quality Indicators, Health Care / standards*
  • Registries
  • Stroke / diagnosis
  • Stroke / therapy*
  • Treatment Outcome