Xlindy: Interactive recognition and information extraction in spreadsheets
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit (visual and textual) information. This translates into a bottleneck, when it comes to automatic analysis and extraction of information. Therefore, we present XLIndy, a Microsoft Excel add-in with a machine learning back-end, written in Python. It showcases our novel methods for layout inference and table recognition in spreadsheets. For a selected task and method, users can visually inspect the results, change configurations, and compare different runs. This enables iterative fine-tuning. Additionally, users can manually revise the predicted layout and tables, and subsequently save them as annotations. The latter is used to measure performance and (re-)train classifiers. Finally, data in the recognized tables can be extracted for further processing. XLIndy supports several standard formats, such as CSV and JSON.
Details
Originalsprache | Englisch |
---|---|
Titel | Proceedings of the ACM Symposium on Document Engineering, DocEng 2019 |
Herausgeber (Verlag) | Association for Computing Machinery (ACM), New York |
Seiten | 25:1-25:4 |
Seitenumfang | 4 |
ISBN (elektronisch) | 978-1-4503-6887-2 |
Publikationsstatus | Veröffentlicht - 23 Sept. 2019 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | DocEng: Document Engineering |
---|
Konferenz
Titel | 19th ACM Symposium on Document Engineering, DocEng 2019 |
---|---|
Dauer | 23 - 26 September 2019 |
Stadt | Berlin |
Land | Deutschland |
Externe IDs
dblp | conf/doceng/KociKLOTGL019 |
---|---|
ORCID | /0000-0001-8107-2775/work/142253491 |
ORCID | /0000-0002-5985-4348/work/162348855 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- Add-in, Annotation, Excel, Information extraction, Interactive, Layout inference, Spreadsheets, Table recognition