Xlindy: Interactive recognition and information extraction in spreadsheets
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit (visual and textual) information. This translates into a bottleneck, when it comes to automatic analysis and extraction of information. Therefore, we present XLIndy, a Microsoft Excel add-in with a machine learning back-end, written in Python. It showcases our novel methods for layout inference and table recognition in spreadsheets. For a selected task and method, users can visually inspect the results, change configurations, and compare different runs. This enables iterative fine-tuning. Additionally, users can manually revise the predicted layout and tables, and subsequently save them as annotations. The latter is used to measure performance and (re-)train classifiers. Finally, data in the recognized tables can be extracted for further processing. XLIndy supports several standard formats, such as CSV and JSON.
Details
Original language | English |
---|---|
Title of host publication | Proceedings of the ACM Symposium on Document Engineering, DocEng 2019 |
Publisher | Association for Computing Machinery (ACM), New York |
Pages | 25:1-25:4 |
Number of pages | 4 |
ISBN (electronic) | 978-1-4503-6887-2 |
Publication status | Published - 23 Sept 2019 |
Peer-reviewed | Yes |
Publication series
Series | DocEng: Document Engineering |
---|
Conference
Title | 19th ACM Symposium on Document Engineering, DocEng 2019 |
---|---|
Duration | 23 - 26 September 2019 |
City | Berlin |
Country | Germany |
External IDs
dblp | conf/doceng/KociKLOTGL019 |
---|---|
ORCID | /0000-0001-8107-2775/work/142253491 |
ORCID | /0000-0002-5985-4348/work/162348855 |
Keywords
ASJC Scopus subject areas
Keywords
- Add-in, Annotation, Excel, Information extraction, Interactive, Layout inference, Spreadsheets, Table recognition