The Impact of Access to Clinical Guidelines on LLM-Based Treatment Recommendations for Chronic Hepatitis B

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Robert Siepmann - , University Hospital Aachen (Author)
  • Carolin Victoria Schneider - , University Hospital Aachen (Author)
  • Marc Sebastian von der Stueck - , University Hospital Aachen (Author)
  • Iakovos Amygdalos - , University Hospital Aachen (Author)
  • Karsten Große - , University Hospital Aachen (Author)
  • Kai Markus Schneider - , University Hospital Aachen (Author)
  • Maike Rebecca Pollmanns - , University Hospital Aachen (Author)
  • Mohamad Murad - , University Hospital Aachen (Author)
  • Joel Joy - , University Hospital Aachen (Author)
  • Elena Kabak - , University Hospital Aachen (Author)
  • Marcella Ricardis May - , RWTH Aachen University (Author)
  • Jan Clusmann - , Else Kröner Fresenius Center for Digital Health, University Hospital Aachen (Author)
  • Christiane Kuhl - , University Hospital Aachen (Author)
  • Sven Nebelung - , University Hospital Aachen (Author)
  • Jakob Nikolas Kather - , Department of Internal Medicine I, Else Kröner Fresenius Center for Digital Health, National Center for Tumor Diseases (NCT) Heidelberg (Author)
  • Daniel Truhn - , University Hospital Aachen (Author)

Abstract

Background and Aims: Large language models (LLMs) can potentially support clinicians in their daily routine by providing easy access to information. Yet, they are plagued by stating incorrect facts and hallucinating when queried. Increasing the context by providing external databases while prompting LLMs may decrease the risk of misinformation. This study compares the influence of increased context on the coherence of LLM-based treatment recommendations with the recently updated WHO guidelines for the treatment of chronic hepatitis B (CHB). Methods: GPT-4 was queried with five clinical case vignettes in two configurations: with and without additional context. The clinical vignettes were explicitly constructed so that treatment recommendations differed between the formerly applicable 2015 WHO guidelines and the updated 2024 ones. GPT-4 with context was provided access to the updated guidelines, while GPT-4 without context had to rely on its internal knowledge. GPT-4 was accessed only a few days after the release of the new WHO guidelines. Treatment recommendations were compared regarding guideline coherence, information inclusion, textual errors, wording clarity and preciseness by seven physicians. Results: Using GPT-4 with context increased the coherence of the treatment recommendations with the new 2024 guidelines from 51% to 91% compared to GPT-4 without context. Similar trends were observed for all other categories, leading to an increase of 54% in preciseness and clarity, 24% in completeness of incorporating the case vignette information, and 12% in textual correctness. Conclusions: If LLMs are consulted by clinicians for medical advice, they should be given access to external data sources to increase the chance of providing factually correct advice.

Details

Original languageEnglish
Article numbere70324
JournalLiver international
Volume45
Issue number10
Publication statusPublished - Oct 2025
Peer-reviewedYes

External IDs

PubMed 40891225
ORCID /0000-0002-3730-5348/work/198594708

Keywords

Sustainable Development Goals

ASJC Scopus subject areas

Keywords

  • chronic hepatitis B, GPT, guideline coherence, LLM