Context-Aware Search for Environmental Data Using Dense Retrieval

Research output: Contribution to journalResearch articleContributedpeer-review

Abstract

The search for environmental data typically involves lexical approaches, where query terms are matched with metadata records based on measures of term frequency. In contrast, dense retrieval approaches employ language models to comprehend the context and meaning of a query and provide relevant search results. However, for environmental data, this has not been researched and there are no corpora or evaluation datasets to fine-tune the models. This study demonstrates the adaptation of dense retrievers to the domain of climate-related scientific geodata. Four corpora containing text passages from various sources were used to train different dense retrievers. The domain-adapted dense retrievers are integrated into the search architecture of a standard metadata catalogue. To improve the search results further, we propose a spatial re-ranking stage after the initial retrieval phase to refine the results. The evaluation demonstrates superior performance compared to the baseline model commonly used in metadata catalogues (BM25). No clear trends in performance were discovered when comparing the results of the dense retrievers. Therefore, further investigation aspects are identified to finally enable a recommendation of the most suitable corpus composition.

Details

Original languageEnglish
Article number380
JournalISPRS International Journal of Geo-Information
Volume13
Issue number11
Publication statusPublished - 30 Oct 2024
Peer-reviewedYes

External IDs

ORCID /0000-0002-9016-1996/work/171064695
ORCID /0000-0001-7144-3376/work/171065281

Keywords

Keywords

  • BERT-based models, GeoAI, IR, Information Retrieval, NLP, SDI, language model