Bridging language models and knowledge graphs with controlled natural languages

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

Abstract

Knowledge graphs provide a source of up-to-date structured knowledge, which makes them an ideal counterpart to LLMs. LLMs, by themselves, are not trained to run structured queries internally and can become stale without a source of up-to-date information. We hypothesize that knowledge graphs can be effectively connected to large language models via controlled natural languages. Unlike standard formal query languages, controlled natural languages (CNLs) offer a syntax close to human language. Yet, can be unambiguously converted into formal languages such as SPARQL. In this article, we explore the premise that the extensive pre-training of LLMs on diverse textual data enables them to perform semantic parsing into controlled natural languages more accurately than parsing directly into formal query languages. To evaluate our hypothesis, we constructed a dataset facilitating the comparison between a standard formal language and two controlled natural languages. Our findings show a significant accuracy improvement when using the same amount of controlled natural language training samples. Additionally, fewer samples are required to achieve a desired performance when using CNLs compared to standard query languages. The higher data efficiency of CNLs is particularly important to reduce the complexity and cost of the collection and curation. This enables a more efficient way for LLMs to query KGs.

Details

Original languageEnglish
Article number115405
JournalKnowledge-based systems
Volume337
Publication statusPublished - 25 Mar 2026
Peer-reviewedYes

Keywords

Keywords

  • Controlled natural language, Entity linking, Knowledge graphs question answering, Large language models, Sample efficiency, Semantic parsing