Talmud-IR: A Talmud-Inspired Interface for Discussing RAG Response Quality

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

Retrieval-augmented generation (RAG) systems promise factually grounded answers, yet evaluating their quality remains difficult. Automated metrics and LLM-as-judge approaches offer scalability but risk circularity, benchmark leakage, and loss of diversity. Human assessors, meanwhile, often struggle to notice subtle omissions or hallucinations when responses appear linguistically fluent and confident. We present Talmud-IR, a novel user interface inspired by the dialogic structure of the Talmud. It visualizes RAG outputs as a central text surrounded by layers of evidence, commentary, and meta-assessment, enabling sustained human–LLM discussion about system quality and failure priorities. The prototype supports comparative RAG evaluation, collaborative exploration of “unknown unknowns,” and pedagogical use for teaching critical reading of AI-generated content. Code and Prototype: https://github.com/WojciechKusa/talmud-ir

Details

Original languageEnglish
Title of host publicationAdvances in Information Retrieval
EditorsRicardo Campos, Adam Jatowt, Yanyan Lan, Mohammad Aliannejadi, Christine Bauer, Sean MacAvaney, Avishek Anand, Nan Bai, Masoud Mansoury, Zhaochun Ren, Suzan Verberne
PublisherSpringer Science and Business Media B.V.
Pages148-153
Number of pages6
ISBN (electronic)978-3-032-21321-1
ISBN (print)978-3-032-21320-4
Publication statusPublished - Mar 2026
Peer-reviewedYes

Publication series

SeriesLecture Notes in Computer Science
Volume16486 LNCS
ISSN0302-9743

Conference

Title48th European Conference on Information Retrieval
Abbreviated titleECIR 2026
Conference number48
Duration29 March - 2 April 2026
Website
LocationLijm & Cultuur
CityDelft
CountryNetherlands

Keywords

Keywords

  • Exploratory Evaluation, LLM judge, RAG