Use of artificial intelligence for nonlinear benchmarking of surgical care

Ander Dorken-Gallastegi; Majed El Hechi; Maxime Amram; Leon Naar; Lydia R Maurer; Anthony Gebran; Jack Dunn; Ying Daisy Zhuo; Jordan Levine; Dimitris Bertsimas; Haytham M A Kaafarani

doi:10.1016/j.surg.2023.08.025

Use of artificial intelligence for nonlinear benchmarking of surgical care

Surgery. 2023 Dec;174(6):1302-1308. doi: 10.1016/j.surg.2023.08.025. Epub 2023 Sep 29.

Affiliations

¹ Trauma, Emergency Surgery, and Surgical Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA; Center for Outcomes and Patient Safety in Surgery, Massachusetts General Hospital, Boston, MA.
² Alexandria Health, Cambridge, MA.
³ Massachusetts Institute of Technology, Cambridge, MA.
⁴ Trauma, Emergency Surgery, and Surgical Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA; Center for Outcomes and Patient Safety in Surgery, Massachusetts General Hospital, Boston, MA. Electronic address: hkaafarani@mgh.harvard.edu.

PMID: 37778969
DOI: 10.1016/j.surg.2023.08.025

Abstract

Background: Existent methodologies for benchmarking the quality of surgical care are linear and fail to capture the complex interactions of preoperative variables. We sought to leverage novel nonlinear artificial intelligence methodologies to benchmark emergency surgical care.

Methods: Using a nonlinear but interpretable artificial intelligence methodology called optimal classification trees, first, the overall observed mortality rate at the index hospital's emergency surgery population (index cohort) was compared to the risk-adjusted expected mortality rate calculated by the optimal classification trees from the American College of Surgeons National Surgical Quality Improvement Program database (benchmark cohort). Second, the artificial intelligence optimal classification trees created different "nodes" of care representing specific patient phenotypes defined by the artificial intelligence optimal classification trees without human interference to optimize prediction. These nodes capture multiple iterative risk-adjusted comparisons, permitting the identification of specific areas of excellence and areas for improvement.

Results: The index and benchmark cohorts included 1,600 and 637,086 patients, respectively. The observed and risk-adjusted expected mortality rates of the index cohort calculated by optimal classification trees were similar (8.06% [95% confidence interval: 6.8-9.5] vs 7.53%, respectively, P = .42). Two areas of excellence and 4 for improvement were identified. For example, the index cohort had lower-than-expected mortality when patients were older than 75 and in respiratory failure and septic shock preoperatively but higher-than-expected mortality when patients had respiratory failure preoperatively and were thrombocytopenic, with an international normalized ratio ≤1.7.

Conclusion: We used artificial intelligence methodology to benchmark the quality of emergency surgical care. Such nonlinear and interpretable methods promise a more comprehensive evaluation and a deeper dive into areas of excellence versus suboptimal care.

MeSH terms

Artificial Intelligence
Benchmarking
Databases, Factual
Emergency Medical Services*
Humans
Respiratory Insufficiency*