TU_DBS in the ARQMath Lab 2021, CLEF

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

Mathematical Information Retrieval (MIR) deals with the task of finding relevant documents that contain text and mathematical formulas. Therefore, retrieval systems should not only be able to process natural language, but also mathematical and scientific notation to retrieve documents. The goal of this work is to review the participation of our team in the ARQMath 2021 Lab where two different approaches based on ALBERT and ColBERT were applied to a Question Answer Retrieval task and a Formula Similarity task. The ALBERT-based classification approach received competitive results for the first task. We found that by pre-training on data separated in chunks of text and formulas, the model performed better on formula data. This way of pre-training could also be beneficial for the Formula Search task.

Details

Original languageEnglish
Title of host publicationProceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th, 2021
Pages107-124
Number of pages18
Volume2936
Publication statusPublished - 2021
Peer-reviewedYes

Publication series

SeriesCEUR Workshop Proceedings
Volume2936
ISSN1613-0073

Conference

Title2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021
Duration21 - 24 September 2021
CityVirtual, Bucharest
CountryRomania

External IDs

Scopus 85113430502
ORCID /0000-0001-8107-2775/work/142253437

Keywords

ASJC Scopus subject areas

Keywords

  • BERT-based models, Information retrieval, Mathematical language processing