Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports

Bastien Le Guellec; Alexandre Lefèvre; Charlotte Geay; Lucas Shorten; Cyril Bruge; Lotfi Hacein-Bey; Philippe Amouyel; Jean-Pierre Pruvo; Grégory Kuchcinski; Aghiles Hamroun

doi:10.1148/ryai.230364

Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports

Radiol Artif Intell. 2024 May 8:e230364. doi: 10.1148/ryai.230364. Online ahead of print.

Affiliation

¹ From the Department of Neuroradiology (B.L.G., A.L., C.B., J.P.P., C.K.), Department of Public Health (B.L.G., P.A., A.H.), and INCLUDE-Hospital data warehouse (C.G., L.S.), CHU Lille-Univ Lille, Rue Emile Laine, 59000 Lille, France; Department of Radiology, UC Davis Health, Sacramento, Calif (L.H.B.); Univ Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1167-RID-AGE-Facteurs de risque et déterminants moléculaires des maladies liées au vieillissement, Lille, France (P.A., A.H.); Inserm, U1172-LilNCog-Lille Neuroscience & Cognition, Univ Lille, Lille, France (J.P.P., G.K.); and UAR 2014-US 41-PLBS-Plateformes Lilloises en Biologie & Santé, Univ Lille, Lille, France (J.P.P., G.K.).

PMID: 38717292
DOI: 10.1148/ryai.230364

Abstract

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To assess the performance of a local open-source large language model (LLM) on various information extraction tasks from real-life emergency brain MRI reports. Materials and Methods All consecutive emergency brain MRI reports written in 2022 from a French quaternary center were retrospectively reviewed. Two radiologists identified MRIs that were performed for headaches. Four radiologists scored reports' conclusions as normal or abnormal. Abnormalities were labeled as either headache-causing or incidental. Vicuna, an open-source LLM, performed the same tasks. Vicuna's performance metrics were evaluated using the radiologists' consensus as the reference standard. Results Among the 2398 reports during the study period, radiologists identified 595 that included headaches in their indication (median age of patients, 35 years [IQR, 26-51], 68% (403/595) female). A positive finding was reported in 227/595 (38%) cases, 136 of which could explain the headache. The LLM had a sensitivity/specificity (95%CI), respectively, of 98% (583/595)(97-99)/99% (1791/1803)(99-100) for detecting the presence of headache in the clinical context, 99% (514/517)(98-100)/99% (68/69)(92-100) for the use of contrast medium injection, 97% (219/227)(93-99)/99% (364/368)(97-100) for study categorization as normal or abnormal and 88% (120/136)(82- 93)/73% (66/91)(62-81) for causal inference between MRI findings and headache. Conclusion An open-source LLM was able to extract information from free-text radiology reports with excellent accuracy without requiring further training. ©RSNA, 2024.