Transformer-Encoder-Based Mathematical Information Retrieval
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Mathematical Information Retrieval (MIR) deals with the task of finding relevant documents that contain text and mathematical formulas. Therefore, retrieval systems should not only be able to process natural language, but also mathematical and scientific notation to retrieve documents. In this work, we evaluate two transformer-encoder-based approaches on a Question Answer retrieval task. Our pre-trained ALBERT-model demonstrated competitive performance as it ranked in the first place for p’@10. Furthermore, we found that separating the pre-training data into chunks of text and formulas improved the overall performance on formula data.
Details
| Original language | English |
|---|---|
| Title of host publication | Experimental IR Meets Multilinguality, Multimodality, and Interaction |
| Editors | Alberto Barrón-Cedeño, Giovanni Da San Martino, Guglielmo Faggioli, Nicola Ferro, Mirko Degli Esposti, Fabrizio Sebastiani, Craig Macdonald, Gabriella Pasi, Allan Hanbury, Martin Potthast |
| Publisher | Springer Science and Business Media B.V. |
| Pages | 175-189 |
| Number of pages | 15 |
| ISBN (electronic) | 978-3-031-13643-6 |
| ISBN (print) | 978-3-031-13642-9 |
| Publication status | Published - 2022 |
| Peer-reviewed | Yes |
Publication series
| Series | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 13390 LNCS |
| ISSN | 0302-9743 |
Conference
| Title | 13th Conference and Labs of the Evaluation Forum |
|---|---|
| Subtitle | Information Access Evaluation meets Multilinguality, Multimodality, and Visualization |
| Abbreviated title | CLEF 2022 |
| Conference number | 13 |
| Duration | 5 - 8 September 2022 |
| Website | |
| Location | Università di Bologna |
| City | Bologna |
| Country | Italy |
External IDs
| ORCID | /0000-0001-8107-2775/work/194824070 |
|---|
Keywords
ASJC Scopus subject areas
Keywords
- ARQMath Lab, BERT-based Models, Information Retrieval, Mathematical Language Processing