L3X: Long Object List Extraction from Long Documents
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Information extraction with LLMs is typically geared toward extracting individual subject-predicate-object (SPO) triples from short factual texts such as Wikipedia or news articles. In contrast, the L3X methodology tackles the task of extracting long lists from long texts: given a target subject S and predicate P, the goal is to extract the complete list of all objects O for which SPO holds. This is especially challenging over long texts, like entire books or large web crawls, where many objects are long-tail entities. We demonstrate L3X, a web-based system designed for this previously unexplored task. L3X comprises of recall-oriented candidate generation using LLMs in RAG mode, with novel methods for ranking and batching passages, followed by precision-oriented scrutinization. Our demo supports exploring multiple configurations, including LLM-only and RAG baselines, showcasing use cases like fiction-character relations from book series (e.g., 50+ friends of Harry Potter) and business relations from web pages (e.g., CEOs of Toyota).
Details
| Originalsprache | Englisch |
|---|---|
| Titel | CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management |
| Seiten | 6693-6697 |
| Seitenumfang | 5 |
| Publikationsstatus | Veröffentlicht - 10 Nov. 2025 |
| Peer-Review-Status | Ja |
Konferenz
| Titel | 34th ACM International Conference on Information and Knowledge Management |
|---|---|
| Kurztitel | CIKM 2025 |
| Veranstaltungsnummer | 34 |
| Dauer | 10 - 14 November 2025 |
| Webseite | |
| Ort | COEX |
| Stadt | Seoul |
| Land | Südkorea |
Externe IDs
| ORCID | /0000-0002-5410-218X/work/200631820 |
|---|
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- information extraction, long documents, narrative text