Xlindy: Interactive recognition and information extraction in spreadsheets

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Contributors

Abstract

Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit (visual and textual) information. This translates into a bottleneck, when it comes to automatic analysis and extraction of information. Therefore, we present XLIndy, a Microsoft Excel add-in with a machine learning back-end, written in Python. It showcases our novel methods for layout inference and table recognition in spreadsheets. For a selected task and method, users can visually inspect the results, change configurations, and compare different runs. This enables iterative fine-tuning. Additionally, users can manually revise the predicted layout and tables, and subsequently save them as annotations. The latter is used to measure performance and (re-)train classifiers. Finally, data in the recognized tables can be extracted for further processing. XLIndy supports several standard formats, such as CSV and JSON.

Details

Original languageEnglish
Title of host publicationProceedings of the ACM Symposium on Document Engineering, DocEng 2019
PublisherAssociation for Computing Machinery (ACM), New York
Pages25:1-25:4
Number of pages4
ISBN (electronic)978-1-4503-6887-2
Publication statusPublished - 23 Sept 2019
Peer-reviewedYes

Publication series

SeriesDocEng: Document Engineering

Conference

Title19th ACM Symposium on Document Engineering, DocEng 2019
Duration23 - 26 September 2019
CityBerlin
CountryGermany

External IDs

dblp conf/doceng/KociKLOTGL019
ORCID /0000-0001-8107-2775/work/142253491
ORCID /0000-0002-5985-4348/work/162348855

Keywords

ASJC Scopus subject areas

Keywords

  • Add-in, Annotation, Excel, Information extraction, Interactive, Layout inference, Spreadsheets, Table recognition