Chat GPT-4 shows high agreement in MRI protocol selection compared to board-certified neuroradiologists

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Zeynep Bendella - , University of Bonn, German Center for Neurodegenerative Diseases (DZNE) (Author)
  • Barbara Daria Wichtmann - , University of Bonn, German Center for Neurodegenerative Diseases (DZNE) (Author)
  • Ralf Clauberg - , University of Bonn (Author)
  • Vera C. Keil - , Vrije Universiteit Amsterdam (VU) (Author)
  • Nils C. Lehnen - , University of Bonn, German Center for Neurodegenerative Diseases (DZNE) (Author)
  • Robert Haase - , University of Bonn, German Center for Neurodegenerative Diseases (DZNE) (Author)
  • Laura C. Sáez - , Vrije Universiteit Amsterdam (VU), Hospital Universitario Son Llàtzer (Author)
  • Isabella C. Wiest - , Else Kröner Fresenius Center for Digital Health, Universitätsmedizin Mannheim (Author)
  • Jakob Nikolas Kather - , Else Kröner Fresenius Center for Digital Health (Author)
  • Christoph Endler - , University of Bonn Medical Center (Author)
  • Alexander Radbruch - , University of Bonn, German Center for Neurodegenerative Diseases (DZNE) (Author)
  • Daniel Paech - , University of Bonn, Harvard Medical School (HMS) (Author)
  • Katerina Deike - , University of Bonn, Massachusetts General Hospital (Author)

Abstract

Objectives: The aim of this study was to determine whether ChatGPT-4 can correctly suggest MRI protocols and additional MRI sequences based on real-world Radiology Request Forms (RRFs) as well as to investigate the ability of ChatGPT-4 to suggest time saving protocols. Material & methods: Retrospectively, 1,001 RRFs of our Department of Neuroradiology (in-house dataset), 200 RRFs of an independent Department of General Radiology (independent dataset) and 300 RRFs from an external, foreign Department of Neuroradiology (external dataset) were included. Patients’ age, sex, and clinical information were extracted from the RRFs and used to prompt ChatGPT- 4 to choose an adequate MRI protocol from predefined institutional lists. Four independent raters then assessed its performance. Additionally, ChatGPT-4 was tasked with creating case-specific protocols aimed at saving time. Results: Two and 7 of 1,001 protocol suggestions of ChatGPT-4 were rated “unacceptable” in the in-house dataset for reader 1 and 2, respectively. No protocol suggestions were rated “unacceptable” in both the independent and external dataset. When assessing the inter-reader agreement, Coheńs weighted ĸ ranged from 0.88 to 0.98 (each p < 0.001). ChatGPT-4′s freely composed protocols were approved in 766/1,001 (76.5 %) and 140/300 (46.67 %) cases of the in-house and external dataset with mean time savings (standard deviation) of 3:51 (minutes:seconds) (±2:40) minutes and 2:59 (±3:42) minutes per adopted in-house and external MRI protocol. Conclusion: ChatGPT-4 demonstrated a very high agreement with board-certified (neuro-)radiologists in selecting MRI protocols and was able to suggest approved time saving protocols from the set of available sequences.

Details

Original languageEnglish
Article number112416
JournalEuropean journal of radiology
Volume193
Publication statusPublished - Dec 2025
Peer-reviewedYes

External IDs

PubMed 40961911
ORCID /0000-0002-3730-5348/work/198594711

Keywords

Keywords

  • ChatGPT-4, Large language model (LLM), MRI protocol, Radiology request form