A software pipeline for medical information extraction with large language models, open source and suitable for oncology

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

Abstract

In medical oncology, text data, such as clinical letters or procedure reports, is stored in an unstructured way, making quantitative analysis difficult. Manual review or structured information retrieval is time-consuming and costly, whereas Large Language Models (LLMs) offer new possibilities in natural language processing for structured Information Extraction (IE) from medical free text. This protocol describes a workflow (LLM-AIx) for extracting predefined clinical entities from unstructured oncology text using privacy-preserving LLMs. It addresses a key barrier in clinical research and care by enabling efficient information extraction to support decision-making and large-scale data analysis. It runs on local hospital infrastructure, eliminating the need to transfer patient data externally. We demonstrate its utility on 100 pathology reports from The Cancer Genome Atlas (TCGA) for TNM stage extraction. LLM-AIx requires no programming skills and offers a user-friendly interface for rapid, structured data extraction from clinical free text.

Details

OriginalspracheEnglisch
Aufsatznummer313
Fachzeitschriftnpj Precision Oncology
Jahrgang9
Ausgabenummer1
PublikationsstatusVeröffentlicht - Dez. 2025
Peer-Review-StatusJa

Externe IDs

ORCID /0009-0005-7029-0028/work/195442382
ORCID /0000-0002-3730-5348/work/198594712

Schlagworte

Ziele für nachhaltige Entwicklung

ASJC Scopus Sachgebiete