Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers

Firas Khader; Jakob Nikolas Kather; Tianyu Han; Sven Nebelung; Christiane Kuhl; Johannes Stegmaier; Daniel Truhn

doi:10.1007/978-3-031-45676-3_42

Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Firas Khader - , RWTH Aachen University (Author)
Jakob Nikolas Kather - , RWTH Aachen University (Author)
Tianyu Han - , Medical Faculty Carl Gustav Carus (Author)
Sven Nebelung - , RWTH Aachen University (Author)
Christiane Kuhl - , RWTH Aachen University (Author)
Johannes Stegmaier - , RWTH Aachen University (Author)
Daniel Truhn - , RWTH Aachen University (Author)

Else Kröner Fresenius Center for Digital Health

Abstract

Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen. An automated analysis of such images using deep learning models is therefore of high demand. The transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information. Here, the whole-slide image is partitioned into smaller image patches and feature tokens are extracted from these image patches. However, while the conventional transformer allows for a simultaneous processing of a large set of input tokens, the computational demand scales quadratically with the number of input tokens and thus quadratically with the number of image patches. To address this problem we propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches. Our experiments demonstrate that this architecture is at least on-par with and even outperforms other attention-based state-of-the-art methods on two public datasets: On the use-case of lung cancer (TCGA NSCLC) our model reaches a mean area under the receiver operating characteristic (AUC) of 0.970 ± 0.008 and on renal cancer (TCGA RCC) reaches a mean AUC of 0.985 ± 0.004. Furthermore, we show that our proposed model is efficient in low-data regimes, making it a promising approach for analyzing whole-slide images in resource-limited settings. To foster research in this direction, we make our code publicly available on GitHub: https://github.com/FirasGit/cascaded_cross_attention.

Details

Original language	English
Title of host publication	Machine Learning in Medical Imaging
Editors	Xiaohuan Cao, Xi Ouyang, Xuanang Xu, Islem Rekik, Zhiming Cui
Publisher	Springer Science and Business Media B.V.
Pages	417-426
Number of pages	10
ISBN (electronic)	978-3-031-45676-3
ISBN (print)	978-3-031-45675-6
Publication status	Published - 2024
Peer-reviewed	Yes

Publication series

Series	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14349 LNCS
ISSN	0302-9743

Workshop

Title	14th International Workshop on Machine Learning in Medical Imaging
Abbreviated title	MLMI 2023
Conference number	14
Duration	8 October 2023
Location	Vancouver Convention Center
City	Vancouver
Country	Canada

Research Portal of the TU Dresden