Talmud-IR: A Talmud-Inspired Interface for Discussing RAG Response Quality

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Retrieval-augmented generation (RAG) systems promise factually grounded answers, yet evaluating their quality remains difficult. Automated metrics and LLM-as-judge approaches offer scalability but risk circularity, benchmark leakage, and loss of diversity. Human assessors, meanwhile, often struggle to notice subtle omissions or hallucinations when responses appear linguistically fluent and confident. We present Talmud-IR, a novel user interface inspired by the dialogic structure of the Talmud. It visualizes RAG outputs as a central text surrounded by layers of evidence, commentary, and meta-assessment, enabling sustained human–LLM discussion about system quality and failure priorities. The prototype supports comparative RAG evaluation, collaborative exploration of “unknown unknowns,” and pedagogical use for teaching critical reading of AI-generated content. Code and Prototype: https://github.com/WojciechKusa/talmud-ir

Details

OriginalspracheEnglisch
TitelAdvances in Information Retrieval
Redakteure/-innenRicardo Campos, Adam Jatowt, Yanyan Lan, Mohammad Aliannejadi, Christine Bauer, Sean MacAvaney, Avishek Anand, Nan Bai, Masoud Mansoury, Zhaochun Ren, Suzan Verberne
Herausgeber (Verlag)Springer Science and Business Media B.V.
Seiten148-153
Seitenumfang6
ISBN (elektronisch)978-3-032-21321-1
ISBN (Print)978-3-032-21320-4
PublikationsstatusVeröffentlicht - März 2026
Peer-Review-StatusJa

Publikationsreihe

ReiheLecture Notes in Computer Science
Band16486 LNCS
ISSN0302-9743

Konferenz

Titel48th European Conference on Information Retrieval
KurztitelECIR 2026
Veranstaltungsnummer48
Dauer29 März - 2 April 2026
Webseite
OrtLijm & Cultuur
StadtDelft
LandNiederlande

Schlagworte

Schlagwörter

  • Exploratory Evaluation, LLM judge, RAG