Combining Self-Retrieval-Augmented Generation with Divide-and-Conquer for Language Model-based Knowledge Base Construction

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Abstract

Knowledge base construction from language models (LMs) without external retrieval presents unique challenges. Therefore, we present a hybrid, LM-only system for the LM-KBC 2025 challenge [1], which requires constructing knowledge bases using a fixed model (Qwen3-8B) without fine-tuning or external retrieval. Our method combines Self-RAG for general relations with a divide-and-conquer module specialized for awardWonBy. Self-RAG follows a description-first, then extraction-second design with strict output specifications (names-only or one-number-only) to reduce reliance on brittle post-hoc cleaning; numeric answers are normalized to a canonical digit form. The divide-and-conquer module aggregates candidates from constrained, names-only subqueries and filters them with a strict name validator. Evaluation uses the organizers’ official string-matching metric. On the hidden test leaderboard, our system achieves the 2nd place out of 5 participants, and improves macro-F1 from 0.212 (baseline) to 0.405 (+0.194; ∼+91.5% relative improvement), with large gains on companyTradesAtStockExchange (+0.339), personHasCityOfDeath (+0.330), and countryLandBordersCountry (+0.162).

Details

Original languageEnglish
Title of host publicationKBC-LM Workshop and LM-KBC Challenge at ISWC 2025
EditorsSimon Razniewski, Jan-Christoph Kalo, Duygu Islakoğlu, Tuan-Phong Nguyen, Bohui Zhang
Number of pages20
Publication statusPublished - 2025
Peer-reviewedYes

Publication series

SeriesCEUR Workshop Proceedings
Volume4041
ISSN1613-0073

Other

Title4th challenge on Knowledge Base Construction from Pre-trained Language Models
Abbreviated titleLM-KBC 2025
Conference number4
Descriptionco-located with the 24th International Semantic Web Conference (ISWC 2025)
Duration2 November 2025
Website
LocationNara Prefectural Convention Center
CityNara
CountryJapan

External IDs

ORCID /0000-0002-5410-218X/work/194826582

Keywords

ASJC Scopus subject areas

Keywords

  • Divide-and-Conquer, Knowledge base construction, Language models, LM-KBC, Self-RAG