Transformer-Encoder-Based Mathematical Information Retrieval

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

Mathematical Information Retrieval (MIR) deals with the task of finding relevant documents that contain text and mathematical formulas. Therefore, retrieval systems should not only be able to process natural language, but also mathematical and scientific notation to retrieve documents. In this work, we evaluate two transformer-encoder-based approaches on a Question Answer retrieval task. Our pre-trained ALBERT-model demonstrated competitive performance as it ranked in the first place for p’@10. Furthermore, we found that separating the pre-training data into chunks of text and formulas improved the overall performance on formula data.

Details

Original languageEnglish
Title of host publicationExperimental IR Meets Multilinguality, Multimodality, and Interaction
EditorsAlberto Barrón-Cedeño, Giovanni Da San Martino, Guglielmo Faggioli, Nicola Ferro, Mirko Degli Esposti, Fabrizio Sebastiani, Craig Macdonald, Gabriella Pasi, Allan Hanbury, Martin Potthast
PublisherSpringer Science and Business Media B.V.
Pages175-189
Number of pages15
ISBN (electronic)978-3-031-13643-6
ISBN (print)978-3-031-13642-9
Publication statusPublished - 2022
Peer-reviewedYes

Publication series

SeriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13390 LNCS
ISSN0302-9743

Conference

Title13th Conference and Labs of the Evaluation Forum
SubtitleInformation Access Evaluation meets Multilinguality, Multimodality, and Visualization
Abbreviated titleCLEF 2022
Conference number13
Duration5 - 8 September 2022
Website
LocationUniversità di Bologna
CityBologna
CountryItaly

External IDs

ORCID /0000-0001-8107-2775/work/194824070

Keywords

Keywords

  • ARQMath Lab, BERT-based Models, Information Retrieval, Mathematical Language Processing