Combining Super Learner with high-dimensional propensity score to improve confounding adjustment: A real-world application in chronic lymphocytic leukemia

Pharmacoepidemiol Drug Saf. 2024 Jan;33(1):e5678. doi: 10.1002/pds.5678. Epub 2023 Aug 23.

Abstract

Purpose: High-dimensional propensity score (hdPS) is a semiautomated method that leverages a vast number of covariates available in healthcare databases to improve confounding adjustment. A novel combined Super Learner (SL)-hdPS approach was proposed to assist with selecting the number of covariates for propensity score inclusion, and was found in plasmode simulation studies to improve bias reduction and precision compared to hdPS alone. However, the approach has not been examined in the applied setting.

Methods: We compared SL-hdPS's performance with that of several hdPS models, each with prespecified covariates and a different number of empirically-identified covariates, using a cohort study comparing real-world bleeding rates between ibrutinib- and bendamustine-rituximab (BR)-treated individuals with chronic lymphocytic leukemia in Optum's de-identified Clinformatics® Data Mart commercial claims database (2013-2020). We used inverse probability of treatment weighting for confounding adjustment and Cox proportional hazards regression to estimate hazard ratios (HRs) for bleeding outcomes. Parameters of interest included prespecified and empirically-identified covariate balance (absolute standardized difference [ASD] thresholds of <0.10 and <0.05) and outcome HR precision (95% confidence intervals).

Results: We identified 2423 ibrutinib- and 1102 BR-treated individuals. Including >200 empirically-identified covariates in the hdPS model compromised covariate balance at both ASD thresholds. SL-hdPS balanced more covariates than all individual hdPS models at both ASD thresholds. The bleeding HR 95% confidence intervals were generally narrower with SL-hdPS than with individual hdPS models.

Conclusion: In a real-world application, hdPS was sensitive to the number of covariates included, while use of SL for covariate selection resulted in improved covariate balance and possibly improved precision.

Keywords: Super Learner; bleeding; chronic lymphocytic leukemia; confounding; high-dimensional propensity score; observational study; pharmacoepidemiology.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cohort Studies
  • Computer Simulation
  • Humans
  • Leukemia, Lymphocytic, Chronic, B-Cell* / drug therapy
  • Propensity Score
  • Proportional Hazards Models