Transformer-Encoder-Based Mathematical Information Retrieval

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Mathematical Information Retrieval (MIR) deals with the task of finding relevant documents that contain text and mathematical formulas. Therefore, retrieval systems should not only be able to process natural language, but also mathematical and scientific notation to retrieve documents. In this work, we evaluate two transformer-encoder-based approaches on a Question Answer retrieval task. Our pre-trained ALBERT-model demonstrated competitive performance as it ranked in the first place for p’@10. Furthermore, we found that separating the pre-training data into chunks of text and formulas improved the overall performance on formula data.

Details

OriginalspracheEnglisch
TitelExperimental IR Meets Multilinguality, Multimodality, and Interaction
Redakteure/-innenAlberto Barrón-Cedeño, Giovanni Da San Martino, Guglielmo Faggioli, Nicola Ferro, Mirko Degli Esposti, Fabrizio Sebastiani, Craig Macdonald, Gabriella Pasi, Allan Hanbury, Martin Potthast
Herausgeber (Verlag)Springer Science and Business Media B.V.
Seiten175-189
Seitenumfang15
ISBN (elektronisch)978-3-031-13643-6
ISBN (Print)978-3-031-13642-9
PublikationsstatusVeröffentlicht - 2022
Peer-Review-StatusJa

Publikationsreihe

ReiheLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band13390 LNCS
ISSN0302-9743

Konferenz

Titel13th Conference and Labs of the Evaluation Forum
UntertitelInformation Access Evaluation meets Multilinguality, Multimodality, and Visualization
KurztitelCLEF 2022
Veranstaltungsnummer13
Dauer5 - 8 September 2022
Webseite
OrtUniversità di Bologna
StadtBologna
LandItalien

Externe IDs

ORCID /0000-0001-8107-2775/work/194824070

Schlagworte

Schlagwörter

  • ARQMath Lab, BERT-based Models, Information Retrieval, Mathematical Language Processing