In-context learning enables multimodal large language models to classify cancer pathology images

Dyke Ferber; Georg Wölflein; Isabella C. Wiest; Marta Ligero; Srividhya Sainath; Narmin Ghaffari Laleh; Omar S.M. El Nahhas; Gustav Müller-Franzes; Dirk Jäger; Daniel Truhn; Jakob Nikolas Kather

doi:10.1038/s41467-024-51465-9

In-context learning enables multimodal large language models to classify cancer pathology images

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Dyke Ferber - , Else Kröner Fresenius Center for Digital Health, Heidelberg University , National Center for Tumor Diseases (NCT) Heidelberg (Author)
Georg Wölflein - , University of St Andrews (Author)
Isabella C. Wiest - , Else Kröner Fresenius Center for Digital Health, Universitätsmedizin Mannheim (Author)
Marta Ligero - , Else Kröner Fresenius Center for Digital Health (Author)
Srividhya Sainath - , Else Kröner Fresenius Center for Digital Health (Author)
Narmin Ghaffari Laleh - , Else Kröner Fresenius Center for Digital Health (Author)
Omar S.M. El Nahhas - , Else Kröner Fresenius Center for Digital Health (Author)
Gustav Müller-Franzes - , University Hospital Aachen (Author)
Dirk Jäger - , National Center for Tumor Diseases (NCT) Heidelberg, University Hospital Heidelberg (Author)
Daniel Truhn - , University Hospital Aachen (Author)
Jakob Nikolas Kather - , Department of Internal Medicine I, Else Kröner Fresenius Center for Digital Health, National Center for Tumor Diseases (NCT) Heidelberg, University Hospital Heidelberg (Author)

Abstract

Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce.

Details

Original language	English
Article number	10104
Number of pages	12
Journal	Nature communications
Volume	15 (2024)
Issue number	1
Publication status	Published - 21 Nov 2024
Peer-reviewed	Yes

External IDs

PubMed	39572531

Keywords

Sustainable Development Goals

SDG 3 - Good Health and Well-being

Research Portal of the TU Dresden

In-context learning enables multimodal large language models to classify cancer pathology images

Contributors

Abstract

Details

External IDs

Keywords

Sustainable Development Goals

ASJC Scopus subject areas