Evaluation of code-based algorithms to identify pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension patients in large administrative databases

Pulm Circ. 2020 Nov 10;10(4):2045894020961713. doi: 10.1177/2045894020961713. eCollection 2020 Oct-Dec.

Abstract

Large administrative healthcare (including insurance claims) databases are used for various retrospective real-world evidence studies. However, in pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension, identifying patients retrospectively based on administrative codes remains challenging, as it relies on code combinations (algorithms) and the accuracy for patient identification of most of them is unknown. This study aimed to assess the performance of various algorithms in correctly identifying patients with pulmonary arterial hypertension or chronic thromboembolic pulmonary hypertension in administrative databases. A systematic literature review was performed to find publications detailing code-based algorithms used to identify pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension patients. PheValuator, a diagnostic predictive modelling tool, was applied to three US claims databases, yielding models that estimated the probability of a patient having the disease. These models were used to evaluate the performance characteristics of selected pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension algorithms. With increasing algorithm complexity, average positive predictive value increased (pulmonary arterial hypertension: 13.4-66.0%; chronic thromboembolic pulmonary hypertension: 10.3-75.1%) and average sensitivity decreased (pulmonary arterial hypertension: 61.5-2.7%; chronic thromboembolic pulmonary hypertension: 20.7-0.2%). Specificities and negative predictive values were high (≥97.5%) for all algorithms. Several of the algorithms performed well overall when considering all of these four performance parameters, and all algorithms performed with similar accuracy across the three claims databases studied, even though most were designed for patient identification in a specific database. Therefore, it is the objective of a study that will determine which algorithm may be most suitable; one- or two-component algorithms are most inclusive and three- or four-component algorithms identify most precise pulmonary arterial hypertension or chronic thromboembolic pulmonary hypertension populations, respectively.

Keywords: PheValuator; chronic thromboembolic pulmonary hypertension; claims databases; pulmonary arterial hypertension; validation.