Chat GPT-4 shows high agreement in MRI protocol selection compared to board-certified neuroradiologists

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

  • Zeynep Bendella - , Universität Bonn, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) (Autor:in)
  • Barbara Daria Wichtmann - , Universität Bonn, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) (Autor:in)
  • Ralf Clauberg - , Universität Bonn (Autor:in)
  • Vera C. Keil - , Vrije Universiteit Amsterdam (VU) (Autor:in)
  • Nils C. Lehnen - , Universität Bonn, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) (Autor:in)
  • Robert Haase - , Universität Bonn, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) (Autor:in)
  • Laura C. Sáez - , Vrije Universiteit Amsterdam (VU), Hospital Universitario Son Llàtzer (Autor:in)
  • Isabella C. Wiest - , Else Kröner Fresenius Zentrum für Digitale Gesundheit, Universitätsmedizin Mannheim (Autor:in)
  • Jakob Nikolas Kather - , Else Kröner Fresenius Zentrum für Digitale Gesundheit (Autor:in)
  • Christoph Endler - , Universitätsklinikum Bonn (Autor:in)
  • Alexander Radbruch - , Universität Bonn, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) (Autor:in)
  • Daniel Paech - , Universität Bonn, Harvard Medical School (HMS) (Autor:in)
  • Katerina Deike - , Universität Bonn, Massachusetts General Hospital (Autor:in)

Abstract

Objectives: The aim of this study was to determine whether ChatGPT-4 can correctly suggest MRI protocols and additional MRI sequences based on real-world Radiology Request Forms (RRFs) as well as to investigate the ability of ChatGPT-4 to suggest time saving protocols. Material & methods: Retrospectively, 1,001 RRFs of our Department of Neuroradiology (in-house dataset), 200 RRFs of an independent Department of General Radiology (independent dataset) and 300 RRFs from an external, foreign Department of Neuroradiology (external dataset) were included. Patients’ age, sex, and clinical information were extracted from the RRFs and used to prompt ChatGPT- 4 to choose an adequate MRI protocol from predefined institutional lists. Four independent raters then assessed its performance. Additionally, ChatGPT-4 was tasked with creating case-specific protocols aimed at saving time. Results: Two and 7 of 1,001 protocol suggestions of ChatGPT-4 were rated “unacceptable” in the in-house dataset for reader 1 and 2, respectively. No protocol suggestions were rated “unacceptable” in both the independent and external dataset. When assessing the inter-reader agreement, Coheńs weighted ĸ ranged from 0.88 to 0.98 (each p < 0.001). ChatGPT-4′s freely composed protocols were approved in 766/1,001 (76.5 %) and 140/300 (46.67 %) cases of the in-house and external dataset with mean time savings (standard deviation) of 3:51 (minutes:seconds) (±2:40) minutes and 2:59 (±3:42) minutes per adopted in-house and external MRI protocol. Conclusion: ChatGPT-4 demonstrated a very high agreement with board-certified (neuro-)radiologists in selecting MRI protocols and was able to suggest approved time saving protocols from the set of available sequences.

Details

OriginalspracheEnglisch
Aufsatznummer112416
FachzeitschriftEuropean journal of radiology
Jahrgang193
PublikationsstatusVeröffentlicht - Dez. 2025
Peer-Review-StatusJa

Externe IDs

PubMed 40961911
ORCID /0000-0002-3730-5348/work/198594711

Schlagworte

Schlagwörter

  • ChatGPT-4, Large language model (LLM), MRI protocol, Radiology request form