Table recognition in spreadsheets via a graph representation
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Spreadsheet software are very popular data management tools. Their ease of use and abundant functionalities equip novices and professionals alike with the means to generate, transform, analyze, and visualize data. As a result, spreadsheets are a great resource of factual and structured information. This accentuates the need to automatically understand and extract their contents. In this paper, we present a novel approach for recognizing tables in spreadsheets. Having inferred the layout role of the individual cells, we build layout regions. We encode the spatial interrelations between these regions using a graph representation. Based on this, we propose Remove and Conquer (RAC), an algorithm for table recognition that implements a list of carefully curated rules. An extensive experimental evaluation shows that our approach is viable. We achieve significant accuracy in a dataset of real spreadsheets from various domains.
Details
| Originalsprache | Englisch |
|---|---|
| Titel | Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018 |
| Herausgeber (Verlag) | Institute of Electrical and Electronics Engineers (IEEE) |
| Seiten | 139-144 |
| Seitenumfang | 6 |
| ISBN (elektronisch) | 9781538633465 |
| Publikationsstatus | Veröffentlicht - 22 Juni 2018 |
| Peer-Review-Status | Ja |
Publikationsreihe
| Reihe | 2018 13th IAPR International Workshop on Document Analysis Systems (DAS) |
|---|
Konferenz
| Titel | 13th IAPR International Workshop on Document Analysis Systems |
|---|---|
| Kurztitel | DAS 2018 |
| Dauer | 24 - 27 April 2018 |
| Webseite | |
| Stadt | Vienna |
| Land | Österreich |
Externe IDs
| Scopus | 85050289070 |
|---|---|
| ORCID | /0000-0001-8107-2775/work/142253471 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- Graph, Rule-based, Spreadsheet, Table Identification, Table Recognition