Evaluating Language Models for Knowledge Base Completion

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Structured knowledge bases (KBs) are a foundation of many intelligent applications, yet are notoriously incomplete. Language models (LMs) have recently been proposed for unsupervised knowledge base completion (KBC), yet, despite encouraging initial results, questions regarding their suitability remain open. Existing evaluations often fall short because they only evaluate on popular subjects, or sample already existing facts from KBs. In this work, we introduce a novel, more challenging benchmark dataset, and a methodology tailored for a realistic assessment of the KBC potential of LMs. For automated assessment, we curate a dataset called WD-Known, which provides an unbiased random sample of Wikidata, containing over 3.9 million facts. In a second step, we perform a human evaluation on predictions that are not yet in the KB, as only this provides real insights into the added value over existing KBs. Our key finding is that biases in dataset conception of previous benchmarks lead to a systematic overestimate of LM performance for KBC. However, our results also reveal strong areas of LMs. We could, for example, perform a significant completion of Wikidata on the relations nativeLanguage, by a factor of ∼ 21 (from 260k to 5.8M) at 82 % precision, and citizenOf by a factor of ∼ 0.3 (from 4.2M to 5.3M) at 90% precision. Moreover, we find that LMs possess surprisingly strong generalization capabilities: even on relations where most facts were not directly observed in LM training, prediction quality can be high. We open-source the benchmark dataset and code. (https://github.com/bveseli/LMsForKBC ).

Details

OriginalspracheEnglisch
TitelThe Semantic Web - 20th International Conference, ESWC 2023, Proceedings
Redakteure/-innenCatia Pesquita, Daniel Faria, Ernesto Jimenez-Ruiz, Jamie McCusker, Mauro Dragoni, Anastasia Dimou, Raphael Troncy, Sven Hertling
Herausgeber (Verlag)Springer Science and Business Media B.V.
Seiten227-243
Seitenumfang17
ISBN (Print)9783031334542
PublikationsstatusVeröffentlicht - 2023
Peer-Review-StatusJa

Publikationsreihe

ReiheLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band13870 LNCS
ISSN0302-9743

Konferenz

Titel20th International Conference on The Semantic Web
KurztitelESWC 2023
Veranstaltungsnummer20
Dauer28 Mai - 1 Juni 2023
Webseite
OrtAldemar Knossos Royal & Royal Villa
StadtHersonissos
LandGriechenland

Externe IDs

ORCID /0000-0002-5410-218X/work/185318147

Schlagworte