XAIEV – A Framework for the Evaluation of XAI-Algorithms for Image Classification

Carsten Knoll; Julian Ullrich; Thomas Manjooran; Kilian Göller; Haadia Amjad; Ronald Tetzlaff; Steffen Seitz

doi:10.1007/978-3-032-08327-2_12

XAIEV – A Framework for the Evaluation of XAI-Algorithms for Image Classification

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Buch/Sammelband/Gutachten › Beigetragen › Begutachtung

Beitragende

Carsten Knoll - , Professur für Grundlagen der Elektronik (Autor:in)
Julian Ullrich - , Heinrich Heine Universität Düsseldorf (Autor:in)
Thomas Manjooran - , Technische Universität Dresden (Autor:in)
Kilian Göller - , Professur für Grundlagen der Elektronik (Autor:in)
Haadia Amjad - , Professur für Grundlagen der Elektronik (Autor:in)
Ronald Tetzlaff - , Professur für Grundlagen der Elektronik (Autor:in)
Steffen Seitz - , Professur für Grundlagen der Elektronik (Autor:in)

Abstract

Convolutional Neural Networks (CNNs), such as VGG and ResNet, have been widely used for image classification for several years. Numerous explainable AI (XAI) algorithms, including Grad-CAM and XRAI, have been proposed to enhance interpretability in this domain. However, a persistent challenge lies in quantitatively comparing different XAI algorithms, variants of the same algorithm, or combinations of CNN models and XAI algorithms. In this work, we introduce XAIEV – a versatile framework for computationally evaluating the quality of saliency-map-based XAI algorithms. This framework includes (A) a benchmark dataset (traffic sign recognition) with a known ground truth and (B) a software toolbox designed to facilitate the evaluation pipeline. The pipeline consists of four steps: (1) model training, (2) applying XAI algorithms to generate weighted saliency maps, (3) generating new test images with varying percentages of “important” pixels removed or retained, and (4) statistically evaluating accuracy changes on these test images and comparison to the ground truth. Based on this statistical evaluation, we define an Accuracy-Sensitivity Quotient (ASQ) as a novel quality metric for XAI algorithms applied to image classification. Using the XAIEV framework, we compare various combinations of CNN architectures (“SimpleCNN” (custom model), VGG, ResNet, ConvNext) with multiple XAI algorithms (Grad-CAM, XRAI, LIME, PRISM). Our numerical results reveal that the performance of XAI algorithms is highly dependent on the underlying CNN model.

Details

Originalsprache	Englisch
Titel	Explainable Artificial Intelligence - 3rd World Conference, xAI 2025, Proceedings
Redakteure/-innen	Riccardo Guidotti, Ute Schmid, Luca Longo
Seiten	250–263
Seitenumfang	14
ISBN (elektronisch)	978-3-032-08327-2
Publikationsstatus	Veröffentlicht - 2026
Peer-Review-Status	Ja

Publikationsreihe

Reihe	CCIS
Band	2578
ISSN	1865-0929

Externe IDs

ORCID	/0000-0001-7436-0103/work/196665455
ORCID	/0000-0002-8389-8869/work/196678386
Scopus	105020237494

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Accuracy Sensitivity Quotient, CNN, Image Classification, Quality Metric, XAI-evaluation

Forschungsportal der TU Dresden