TU_DBS in the ARQMath Lab 2021, CLEF

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Mathematical Information Retrieval (MIR) deals with the task of finding relevant documents that contain text and mathematical formulas. Therefore, retrieval systems should not only be able to process natural language, but also mathematical and scientific notation to retrieve documents. The goal of this work is to review the participation of our team in the ARQMath 2021 Lab where two different approaches based on ALBERT and ColBERT were applied to a Question Answer Retrieval task and a Formula Similarity task. The ALBERT-based classification approach received competitive results for the first task. We found that by pre-training on data separated in chunks of text and formulas, the model performed better on formula data. This way of pre-training could also be beneficial for the Formula Search task.

Details

OriginalspracheEnglisch
TitelProceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th, 2021
Seiten107-124
Seitenumfang18
Band2936
PublikationsstatusVeröffentlicht - 2021
Peer-Review-StatusJa

Publikationsreihe

ReiheCEUR Workshop Proceedings
Band2936
ISSN1613-0073

Konferenz

Titel2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021
Dauer21 - 24 September 2021
StadtVirtual, Bucharest
LandRumänien

Externe IDs

Scopus 85113430502
ORCID /0000-0001-8107-2775/work/142253437

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

  • BERT-based models, Information retrieval, Mathematical language processing