Context-Aware Search for Environmental Data Using Dense Retrieval

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Abstract

The search for environmental data typically involves lexical approaches, where query terms are matched with metadata records based on measures of term frequency. In contrast, dense retrieval approaches employ language models to comprehend the context and meaning of a query and provide relevant search results. However, for environmental data, this has not been researched and there are no corpora or evaluation datasets to fine-tune the models. This study demonstrates the adaptation of dense retrievers to the domain of climate-related scientific geodata. Four corpora containing text passages from various sources were used to train different dense retrievers. The domain-adapted dense retrievers are integrated into the search architecture of a standard metadata catalogue. To improve the search results further, we propose a spatial re-ranking stage after the initial retrieval phase to refine the results. The evaluation demonstrates superior performance compared to the baseline model commonly used in metadata catalogues (BM25). No clear trends in performance were discovered when comparing the results of the dense retrievers. Therefore, further investigation aspects are identified to finally enable a recommendation of the most suitable corpus composition.

Details

OriginalspracheEnglisch
Aufsatznummer380
FachzeitschriftISPRS International Journal of Geo-Information
Jahrgang13
Ausgabenummer11
PublikationsstatusVeröffentlicht - 30 Okt. 2024
Peer-Review-StatusJa

Externe IDs

ORCID /0000-0002-9016-1996/work/171064695
ORCID /0000-0001-7144-3376/work/171065281
Scopus 85210227037

Schlagworte

Schlagwörter

  • GeoAI, IR, SDI, information retrieval