A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports

Daniel Truhn; Christian D. Weber; Benedikt J. Braun; Keno Bressem; Jakob N. Kather; Christiane Kuhl; Sven Nebelung

doi:10.1038/s41598-023-47500-2

A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Daniel Truhn - , Rheinisch-Westfälische Technische Hochschule Aachen (Autor:in)
Christian D. Weber - , Rheinisch-Westfälische Technische Hochschule Aachen (Autor:in)
Benedikt J. Braun - , Eberhard Karls Universität Tübingen (Autor:in)
Keno Bressem - , Charité – Universitätsmedizin Berlin (Autor:in)
Jakob N. Kather - , Else Kröner Fresenius Zentrum für Digitale Gesundheit, Medizinische Klinik und Poliklinik I, Rheinisch-Westfälische Technische Hochschule Aachen, Nationales Zentrum für Tumorerkrankungen (NCT) Heidelberg (Autor:in)
Christiane Kuhl - , Rheinisch-Westfälische Technische Hochschule Aachen (Autor:in)
Sven Nebelung - , Rheinisch-Westfälische Technische Hochschule Aachen (Autor:in)

Abstract

Large language models (LLMs) have shown potential in various applications, including clinical practice. However, their accuracy and utility in providing treatment recommendations for orthopedic conditions remain to be investigated. Thus, this pilot study aims to evaluate the validity of treatment recommendations generated by GPT-4 for common knee and shoulder orthopedic conditions using anonymized clinical MRI reports. A retrospective analysis was conducted using 20 anonymized clinical MRI reports, with varying severity and complexity. Treatment recommendations were elicited from GPT-4 and evaluated by two board-certified specialty-trained senior orthopedic surgeons. Their evaluation focused on semiquantitative gradings of accuracy and clinical utility and potential limitations of the LLM-generated recommendations. GPT-4 provided treatment recommendations for 20 patients (mean age, 50 years ± 19 [standard deviation]; 12 men) with acute and chronic knee and shoulder conditions. The LLM produced largely accurate and clinically useful recommendations. However, limited awareness of a patient’s overall situation, a tendency to incorrectly appreciate treatment urgency, and largely schematic and unspecific treatment recommendations were observed and may reduce its clinical usefulness. In conclusion, LLM-based treatment recommendations are largely adequate and not prone to ‘hallucinations’, yet inadequate in particular situations. Critical guidance by healthcare professionals is obligatory, and independent use by patients is discouraged, given the dependency on precise data input.

Details

Originalsprache	Englisch
Aufsatznummer	20159
Seitenumfang	9
Fachzeitschrift	Scientific reports
Jahrgang	13
Ausgabenummer	1
Publikationsstatus	Veröffentlicht - 17 Nov. 2023
Peer-Review-Status	Ja

Externe IDs

PubMed	37978240

Schlagworte

ASJC Scopus Sachgebiete

Allgemein

Schlagwörter

Medicine, Magnetic Resonance Imaging, Pilot Projects, Language, Humans, Middle Aged, Musculoskeletal Diseases, Male, Retrospective Studies

Bibliotheksschlagworte

610 Medizin und Gesundheit

Forschungsportal der TU Dresden

A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports

Beitragende

Abstract

Details

Externe IDs

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Bibliotheksschlagworte

Verknüpfte Inhalte

Author Correction:: A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports (Scientific Reports, (2023), 13, 1, (20159), 10.1038/s41598-023-47500-2)