Online artificial intelligence platforms and their applicability to gastrointestinal surgical operations

J Gastrointest Surg. 2024 Jan;28(1):64-69. doi: 10.1016/j.gassur.2023.11.019.

Abstract

Background: The internet is a common source of health information for patients. Interactive online artificial intelligence (AI) may be a more reliable source of health-related information than traditional search engines. This study aimed to assess the quality and perceived utility of chat-based AI responses related to 3 common gastrointestinal (GI) surgical procedures.

Methods: A survey of 24 questions covering general perioperative information on cholecystectomy, pancreaticoduodenectomy (PD), and colectomy was created. Each question was posed to Chat Generative Pre-trained Transformer (ChatGPT) in June 2023, and the generated responses were recorded. The quality and perceived utility of responses were independently and subjectively graded by expert respondents specific to each surgical field. Grades were classified as "poor," "fair," "good," "very good," or "excellent."

Results: Among the 45 respondents (general surgeon [n = 13], surgical oncologist [n = 18], colorectal surgeon [n = 13], and transplant surgeon [n = 1]), most practiced at an academic facility (95.6%). Respondents had been in practice for a mean of 12.3 years (general surgeon, 14.5 ± 7.2; surgical oncologist, 12.1 ± 8.2; colorectal surgeon, 10.2 ± 8.0) and performed a mean 53 index operations annually (cholecystectomy, 47 ± 28; PD, 28 ± 27; colectomy, 81 ± 44). Overall, the most commonly assigned quality grade was "fair" or "good" for most responses (n = 622/1080, 57.6%). Most of the 1080 total utility grades were "fair" (n = 279, 25.8%) or "good" (n = 344, 31.9%), whereas only 129 utility grades (11.9%) were "poor." Of note, ChatGPT responses related to cholecystectomy (45.3% ["very good"/"excellent"] vs 18.1% ["poor"/"fair"]) were deemed to be better quality than AI responses about PD (18.9% ["very good"/"excellent"] vs 46.9% ["poor"/"fair"]) or colectomy (31.4% ["very good"/"excellent"] vs 38.3% ["poor"/"fair"]). Overall, only 20.0% of the experts deemed ChatGPT to be an accurate source of information, whereas 15.6% of the experts found it unreliable. Moreover, 1 in 3 surgeons deemed ChatGPT responses as not likely to reduce patient-physician correspondence (31.1%) or not comparable to in-person surgeon responses (35.6%).

Conclusions: Although a potential resource for patient education, ChatGPT responses to common GI perioperative questions were deemed to be of only modest quality and utility to patients. In addition, the relative quality of AI responses varied markedly on the basis of procedure type.

Keywords: Artificial intelligence; ChatGPT; Informational resource; Large language models; Surgical care.

MeSH terms

  • Artificial Intelligence
  • Colectomy
  • Colorectal Neoplasms*
  • Humans
  • Pancreaticoduodenectomy
  • Surgeons*