Capabilities of ChatGPT-3.5 as a Urological Triage System

Christopher Hirtsiefer; Tim Nestler; Johanna Eckrich; Henrieke Beverungen; Carolin Siech; Cem Aksoy; Marianne Leitsmann; Martin Baunacke; Annemarie Uhlig

doi:10.1016/j.euros.2024.10.015

Capabilities of ChatGPT-3.5 as a Urological Triage System

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Christopher Hirtsiefer - , Klinik und Poliklinik für Urologie (Autor:in)
Tim Nestler - , Bundeswehrkrankenhaus Koblenz (Autor:in)
Johanna Eckrich - , Universität Bonn (Autor:in)
Henrieke Beverungen - , St. Elisabeth-Krankenhaus Leipzig (Autor:in)
Carolin Siech - , Universitätsklinikum Frankfurt (Autor:in)
Cem Aksoy - , Universitätsklinikum Gießen und Marburg GmbH (Autor:in)
Marianne Leitsmann - , Medizinische Universität Graz, aQua-Institut GmbH (Autor:in)
Martin Baunacke - , Klinik und Poliklinik für Urologie (Autor:in)
Annemarie Uhlig - , Georg-August-Universität Göttingen (Autor:in)

Abstract

Background and objective: Patients struggle to classify symptoms, which hinders timely medical presentation. With 35–75% of patients seeking information online before consulting a health care professional, generative language–based artificial intelligence (AI), exemplified by ChatGPT-3.5 (GPT-3.5) from OpenAI, has emerged as an important source. The aim of our study was to evaluate the role of GPT-3.5 in triaging acute urological conditions to address a gap in current research. Methods: We assessed GPT-3.5 performance in providing urological differential diagnoses (DD) and recommending a course of action (CoA). Six acute urological pathologies were identified for evaluation. Lay descriptions, sourced from patient forums, formed the basis for 472 queries that were independently entered by nine urologists. We evaluated the output in terms of compliance with the European Association of Urology (EAU) guidelines, the quality of the patient information using the validated DISCERN questionnaire, and a linguistic analysis. Key findings and limitations: The median GPT-3.5 ratings were 4/5 for DD and CoA, and 3/5 for overall information quality. English outputs received higher median ratings than German outputs for DD (4.27 vs 3.95; p < 0.001) and CoA (4.25 vs 4.05; p < 0.005). There was no difference in performance between urgent and non-urgent cases. Analysis of the information quality revealed notable underperformance for source indication, risk assessment, and influence on quality of life. Conclusion and clinical implications: Our results highlights the potential of GPT-3.5 as a triage system for offering individualized, empathetic advice mostly aligned with the EAU guidelines, outscoring other online information. Relevant shortcomings in terms of information quality, especially for risk assessment, need to be addressed to enhance the reliability. Broader transparency and quality improvements are needed before integration into, primarily English-speaking, patient care. Patient summary: We looked at the performance of ChatGPT-3.5 for patients seeking urology advice. We entered more than 400 German and English inputs and assessed the possible diagnoses suggested by this artificial intelligence tool. ChatGPT-3.5 scored well in providing a complete list of possible diagnoses and recommending a course of action mostly in line with current guidelines. The quality of the information was good overall, but missing and unclear sources for the information can be a problem.

Details

Originalsprache	Englisch
Seiten (von - bis)	148-153
Seitenumfang	6
Fachzeitschrift	European Urology Open Science
Jahrgang	70
Publikationsstatus	Veröffentlicht - Dez. 2024
Peer-Review-Status	Ja

Forschungsportal der TU Dresden

Capabilities of ChatGPT-3.5 as a Urological Triage System

Beitragende

Abstract

Details

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter