Transformer-Encoder-Based Mathematical Information Retrieval
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Mathematical Information Retrieval (MIR) deals with the task of finding relevant documents that contain text and mathematical formulas. Therefore, retrieval systems should not only be able to process natural language, but also mathematical and scientific notation to retrieve documents. In this work, we evaluate two transformer-encoder-based approaches on a Question Answer retrieval task. Our pre-trained ALBERT-model demonstrated competitive performance as it ranked in the first place for p’@10. Furthermore, we found that separating the pre-training data into chunks of text and formulas improved the overall performance on formula data.
Details
| Originalsprache | Englisch |
|---|---|
| Titel | Experimental IR Meets Multilinguality, Multimodality, and Interaction |
| Redakteure/-innen | Alberto Barrón-Cedeño, Giovanni Da San Martino, Guglielmo Faggioli, Nicola Ferro, Mirko Degli Esposti, Fabrizio Sebastiani, Craig Macdonald, Gabriella Pasi, Allan Hanbury, Martin Potthast |
| Herausgeber (Verlag) | Springer Science and Business Media B.V. |
| Seiten | 175-189 |
| Seitenumfang | 15 |
| ISBN (elektronisch) | 978-3-031-13643-6 |
| ISBN (Print) | 978-3-031-13642-9 |
| Publikationsstatus | Veröffentlicht - 2022 |
| Peer-Review-Status | Ja |
Publikationsreihe
| Reihe | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Band | 13390 LNCS |
| ISSN | 0302-9743 |
Konferenz
| Titel | 13th Conference and Labs of the Evaluation Forum |
|---|---|
| Untertitel | Information Access Evaluation meets Multilinguality, Multimodality, and Visualization |
| Kurztitel | CLEF 2022 |
| Veranstaltungsnummer | 13 |
| Dauer | 5 - 8 September 2022 |
| Webseite | |
| Ort | Università di Bologna |
| Stadt | Bologna |
| Land | Italien |
Externe IDs
| ORCID | /0000-0001-8107-2775/work/194824070 |
|---|
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- ARQMath Lab, BERT-based Models, Information Retrieval, Mathematical Language Processing