Bridging language models and knowledge graphs with controlled natural languages

Dhananjay Bhandiwad; Preetam Gattogi; Ashish Kangen; Marco Basaldella; Sébastien Ferré; Sahar Vahdati; Jens Lehmann

doi:10.1016/j.knosys.2026.115405

Bridging language models and knowledge graphs with controlled natural languages

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Dhananjay Bhandiwad - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)
Preetam Gattogi - , Department Cognitive AI (Author)
Ashish Kangen - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)
Marco Basaldella - , Amazon Oxford Ltd (Author)
Sébastien Ferré - , Université de Rennes 1 (Author)
Sahar Vahdati - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Author)
Jens Lehmann - , Amazon Development Center Germany GmbH (Author)

Abstract

Knowledge graphs provide a source of up-to-date structured knowledge, which makes them an ideal counterpart to LLMs. LLMs, by themselves, are not trained to run structured queries internally and can become stale without a source of up-to-date information. We hypothesize that knowledge graphs can be effectively connected to large language models via controlled natural languages. Unlike standard formal query languages, controlled natural languages (CNLs) offer a syntax close to human language. Yet, can be unambiguously converted into formal languages such as SPARQL. In this article, we explore the premise that the extensive pre-training of LLMs on diverse textual data enables them to perform semantic parsing into controlled natural languages more accurately than parsing directly into formal query languages. To evaluate our hypothesis, we constructed a dataset facilitating the comparison between a standard formal language and two controlled natural languages. Our findings show a significant accuracy improvement when using the same amount of controlled natural language training samples. Additionally, fewer samples are required to achieve a desired performance when using CNLs compared to standard query languages. The higher data efficiency of CNLs is particularly important to reduce the complexity and cost of the collection and curation. This enables a more efficient way for LLMs to query KGs.

Details

Original language	English
Article number	115405
Journal	Knowledge-based systems
Volume	337
Publication status	Published - 25 Mar 2026
Peer-reviewed	Yes

Keywords

ASJC Scopus subject areas

Keywords

Controlled natural language, Entity linking, Knowledge graphs question answering, Large language models, Sample efficiency, Semantic parsing

Research Portal of the TU Dresden

Contributors

Abstract

Details

Keywords

ASJC Scopus subject areas

Keywords