CiteFusion: an ensemble framework for citation intent classification harnessing dual-model binary couples and SHAP analyses

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

  • Lorenzo Paolini - , Università di Bologna (Autor:in)
  • Sahar Vahdati - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden), Institut für Angewandte Informatik (InfAI) e.V. (Autor:in)
  • Angelo Di Iorio - , Università di Bologna (Autor:in)
  • Robert Wardenga - , Institut für Angewandte Informatik (InfAI) e.V. (Autor:in)
  • Ivan Heibi - , Università di Bologna (Autor:in)
  • Silvio Peroni - , Università di Bologna (Autor:in)

Abstract

Understanding the motivations underlying scholarly citations is essential to evaluate research impact and promote transparent scholarly communication. This study introduces CiteFusion, an ensemble framework designed to address the multi-class Citation Intent Classification task on two benchmark datasets: SciCite and ACL-ARC. The framework employs a one-vs-all decomposition of the multi-class task into class-specific binary subtasks, leveraging complementary pairs of SciBERT and XLNet models, independently tuned, for each citation intent. The outputs of these base models are aggregated through a feedforward neural network meta-classifier to reconstruct the original classification task. To enhance interpretability, SHAP (SHapley Additive exPlanations) is employed to analyze token-level contributions, and interactions among base models, providing transparency into the classification dynamics of CiteFusion, and insights about the kind of misclassifications of the ensemble. In addition, this work investigates the semantic role of structural context by incorporating section titles, as framing devices, into input sentences, assessing their positive impact on classification accuracy. CiteFusion ultimately demonstrates robust performance in imbalanced and data-scarce scenarios: experimental results show that CiteFusion achieves state-of-the-art performance, with Macro-F1 scores of 89.60% on SciCite, and 76.24% on ACL-ARC. Furthermore, to ensure interoperability and reusability, citation intents from both datasets schemas are mapped to Citation Typing Ontology (CiTO) object properties, highlighting some overlaps. Finally, we describe and release a web-based application that classifies citation intents leveraging the CiteFusion models developed on SciCite.

Details

OriginalspracheEnglisch
Seiten (von - bis)5911-5981
Seitenumfang71
FachzeitschriftScientometrics
Jahrgang130
Ausgabenummer11
PublikationsstatusVeröffentlicht - Nov. 2025
Peer-Review-StatusJa

Schlagworte

Schlagwörter

  • Citation Intent Classification, Ensemble Strategies, Explainable AI, Language Models