L3X: Long Object List Extraction from Long Documents
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Information extraction with LLMs is typically geared toward extracting individual subject-predicate-object (SPO) triples from short factual texts such as Wikipedia or news articles. In contrast, the L3X methodology tackles the task of extracting long lists from long texts: given a target subject S and predicate P, the goal is to extract the complete list of all objects O for which SPO holds. This is especially challenging over long texts, like entire books or large web crawls, where many objects are long-tail entities. We demonstrate L3X, a web-based system designed for this previously unexplored task. L3X comprises of recall-oriented candidate generation using LLMs in RAG mode, with novel methods for ranking and batching passages, followed by precision-oriented scrutinization. Our demo supports exploring multiple configurations, including LLM-only and RAG baselines, showcasing use cases like fiction-character relations from book series (e.g., 50+ friends of Harry Potter) and business relations from web pages (e.g., CEOs of Toyota).
Details
| Original language | English |
|---|---|
| Title of host publication | CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management |
| Pages | 6693-6697 |
| Number of pages | 5 |
| Publication status | Published - 10 Nov 2025 |
| Peer-reviewed | Yes |
Conference
| Title | 34th ACM International Conference on Information and Knowledge Management |
|---|---|
| Abbreviated title | CIKM 2025 |
| Conference number | 34 |
| Duration | 10 - 14 November 2025 |
| Website | |
| Location | COEX |
| City | Seoul |
| Country | Korea, Republic of |
External IDs
| ORCID | /0000-0002-5410-218X/work/200631820 |
|---|
Keywords
ASJC Scopus subject areas
Keywords
- information extraction, long documents, narrative text