Pre-trained web table embeddings for table discovery

Michael Günther; Maik Thiele; Julius Gonsior; Wolfgang Lehner

doi:10.1145/3464509.3464892

Pre-trained web table embeddings for table discovery

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Michael Günther - , Chair of Databases (Author)
Maik Thiele - , Chair of Databases (Author)
Julius Gonsior - , Chair of Databases (Author)
Wolfgang Lehner - , Chair of Databases (Author)

Abstract

Pre-trained word embedding models have become the de-facto standard to model text in state-of-the-art analysis tools and frameworks. However, while there are massive amounts of textual data stored in tables, word embedding models are usually pre-trained on large documents. This mismatch can lead to narrowed performance on tasks where text values in tables are analyzed. To improve analysis and retrieval tasks working with tabular data, we propose a novel embedding technique to be pre-trained directly on a large Web table corpus. In an experimental evaluation, we employ our models for various data analysis tasks on different data sources. Our evaluation shows that models using pre-trained Web table embeddings outperform the same models when applied to embeddings pre-trained on text. Moreover, we show that by using Web table embeddings state-of-the-art models for the investigated tasks can be outperformed.

Details

Original language	English
Title of host publication	Proceedings of the 4th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2021
Publisher	Association for Computing Machinery, Inc
Pages	24-31
Number of pages	8
ISBN (electronic)	9781450385350
Publication status	Published - 20 Jun 2021
Peer-reviewed	Yes

Workshop

Title	4th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management
Abbreviated title	aiDM 2021
Conference number	4
Description	Co-located with ACM SIGMOD/PODS 2021
Duration	25 June 2021
Location	Qujiang Hotel & Online
City	Xi'an
Country	China

External IDs

Scopus	85109891275
ORCID	/0000-0001-8107-2775/work/142253440
ORCID	/0000-0002-5985-4348/work/162348853

Research Portal of the TU Dresden

Pre-trained web table embeddings for table discovery

Contributors

Abstract

Details

Workshop

External IDs

Keywords

ASJC Scopus subject areas

Keywords