Personalized Impression Generation for PET Reports Using Large Language Models

Xin Tie; Muheon Shin; Ali Pirasteh; Nevein Ibrahim; Zachary Huemann; Sharon M Castellino; Kara M Kelly; John Garrett; Junjie Hu; Steve Y Cho; Tyler J Bradshaw

doi:10.1007/s10278-024-00985-3

Personalized Impression Generation for PET Reports Using Large Language Models

J Imaging Inform Med. 2024 Apr;37(2):471-488. doi: 10.1007/s10278-024-00985-3. Epub 2024 Feb 2.

Authors

Xin Tie^{1

2}, Muheon Shin¹, Ali Pirasteh^{1

2}, Nevein Ibrahim¹, Zachary Huemann¹, Sharon M Castellino^{3

4}, Kara M Kelly^{5

6}, John Garrett^{1

2}, Junjie Hu^{7

8}, Steve Y Cho^{1

9}, Tyler J Bradshaw¹⁰

Affiliations

¹ Department of Radiology, School of Medicine and Public Health, University of Wissconsin, Madison, WI, USA.
² Department of Medical Physics, School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA.
³ Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA.
⁴ Aflac Cancer and Blood Disorders Center, Childrens Healthcare of Atlanta, Atlanta, GA, USA.
⁵ Department of Pediatric Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.
⁶ Department of Pediatrics, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, NY, USA.
⁷ Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA.
⁸ Department of Computer Science, School of Computer, Data and Information Sciences, University of Wisconsin, Madison, WI, USA.
⁹ University of Wisconsin Carbone Comprehensive Cancer Center, Madison, WI, USA.
¹⁰ Department of Radiology, School of Medicine and Public Health, University of Wissconsin, Madison, WI, USA. tbradshaw@wisc.edu.

Abstract

Large language models (LLMs) have shown promise in accelerating radiology reporting by summarizing clinical findings into impressions. However, automatic impression generation for whole-body PET reports presents unique challenges and has received little attention. Our study aimed to evaluate whether LLMs can create clinically useful impressions for PET reporting. To this end, we fine-tuned twelve open-source language models on a corpus of 37,370 retrospective PET reports collected from our institution. All models were trained using the teacher-forcing algorithm, with the report findings and patient information as input and the original clinical impressions as reference. An extra input token encoded the reading physician's identity, allowing models to learn physician-specific reporting styles. To compare the performances of different models, we computed various automatic evaluation metrics and benchmarked them against physician preferences, ultimately selecting PEGASUS as the top LLM. To evaluate its clinical utility, three nuclear medicine physicians assessed the PEGASUS-generated impressions and original clinical impressions across 6 quality dimensions (3-point scales) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. When physicians assessed LLM impressions generated in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08/5. On average, physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P = 0.41). In summary, our study demonstrated that personalized impressions generated by PEGASUS were clinically useful in most cases, highlighting its potential to expedite PET reporting by automatically drafting impressions.

Keywords: Informatics; Large Language Models; Natural Language Processing; Nuclear Medicine; Positron Emission Tomography; Radiology Report Summarization.

Abstract

Grants and funding