Comparative validity of methods to select appropriate cutoff weight for probabilistic linkage without unique personal identifiers

Pharmacoepidemiol Drug Saf. 2016 Apr;25(4):444-52. doi: 10.1002/pds.3832. Epub 2015 Jul 14.

Abstract

Purpose: Record linkage can enhance data quality of observational database studies. Probabilistic linkage, a method that allows partial match of linkage variables, overcomes disagreements arising from errors and omissions in data entry but also results in false-positive links. The study aimed to assess the validity of probabilistic linkage in the absence of unique personal identifiers (UPI) and the methods of cutoff weight selection.

Methods: We linked an implantable cardioverter defibrillator placement registry to Medicare inpatient files of 1 year with anonymous nonunique variables and assessed the validity of three methods of cutoff selection against an internally derived gold standard with UPI.

Results: Of the 64,890 registry records with an expected linkage rate of 55-65%, 55% were linked at cutoffs associated with positive predictive value (PPV) of ≥90%. Histogram inspection suggested an approximate range of optimal cutoffs. The duplicate method made accurate estimates of cutoff and PPV if the method's assumption was met. With adjusted estimates of the sizes of true matches and searched files, the odds formula method made relatively accurate estimates of cutoff and PPV.

Conclusions: Probabilistic linkage without UPI generated valid linkages when an optimal cutoff was chosen. Cutoff selection remains challenging; however, histogram inspection, the duplicate method, and the odds formula method can be used in conjunction when a gold standard is not available.

Keywords: Medicare; cutoff; database; pharmacoepidemiology; probabilistic model; record linkage; registry.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Aged
  • Databases, Factual / statistics & numerical data*
  • Defibrillators, Implantable / statistics & numerical data*
  • Humans
  • Inpatients
  • Medical Record Linkage / methods*
  • Medicare
  • Predictive Value of Tests
  • Probability
  • Registries
  • United States