Investigating the Usage of Formulae in Mathematical Answer Retrieval

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

This work focuses on the task of Mathematical Answer Retrieval and studies the factors a recent Transformer-Encoder-based Language Model (LM) uses to assess the relevance of an answer for a given mathematical question. Mainly, we investigate three factors: (1) the general influence of mathematical formulae, (2) the usage of structural information of those formulae, (3) the overlap of variable names in answers and questions. The findings of the investigation indicate that the LM for Mathematical Answer Retrieval mainly relies on shallow features such as the overlap of variables between question and answers. Furthermore, we identified a malicious shortcut in the training data that hinders the usage of structural information and by removing this shortcut improved the overall accuracy. We want to foster future research on how LMs are trained for Mathematical Answer Retrieval and provide a basic evaluation set up (Link to repository: https://github.com/AnReu/math_analysis) for existing models.

Details

Original languageEnglish
Title of host publicationAdvances in Information Retrieval
EditorsNazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, Iadh Ounis
PublisherSpringer Science and Business Media B.V.
Pages247-261
Number of pages15
ISBN (electronic)978-3-031-56027-9
ISBN (print)978-3-031-56026-2
Publication statusPublished - 2024
Peer-reviewedYes

Publication series

SeriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14608 LNCS
ISSN0302-9743

Conference

Title46th European Conference on Information Retrieval
Abbreviated titleECIR 2024
Conference number46
Duration24 - 28 March 2024
Website
LocationRadisson Blu Hotel
CityGlasgow
CountryUnited Kingdom

External IDs

ORCID /0000-0001-8107-2775/work/174431834
ORCID /0000-0002-5985-4348/work/174432427
dblp conf/ecir/ReuschGHL24

Keywords

Keywords

  • Mathematical Information Retrieval, Transformer-Encoders