Pre-trained web table embeddings for table discovery

Michael Günther; Maik Thiele; Julius Gonsior; Wolfgang Lehner

doi:10.1145/3464509.3464892

Pre-trained web table embeddings for table discovery

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Michael Günther - , Professur für Datenbanken (Autor:in)
Maik Thiele - , Professur für Datenbanken (Autor:in)
Julius Gonsior - , Professur für Datenbanken (Autor:in)
Wolfgang Lehner - , Professur für Datenbanken (Autor:in)

Abstract

Pre-trained word embedding models have become the de-facto standard to model text in state-of-the-art analysis tools and frameworks. However, while there are massive amounts of textual data stored in tables, word embedding models are usually pre-trained on large documents. This mismatch can lead to narrowed performance on tasks where text values in tables are analyzed. To improve analysis and retrieval tasks working with tabular data, we propose a novel embedding technique to be pre-trained directly on a large Web table corpus. In an experimental evaluation, we employ our models for various data analysis tasks on different data sources. Our evaluation shows that models using pre-trained Web table embeddings outperform the same models when applied to embeddings pre-trained on text. Moreover, we show that by using Web table embeddings state-of-the-art models for the investigated tasks can be outperformed.

Details

Originalsprache	Englisch
Titel	Proceedings of the 4th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2021
Herausgeber (Verlag)	Association for Computing Machinery, Inc
Seiten	24-31
Seitenumfang	8
ISBN (elektronisch)	9781450385350
Publikationsstatus	Veröffentlicht - 20 Juni 2021
Peer-Review-Status	Ja

Konferenz

Titel	4th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2021
Dauer	20 - 25 Juni 2021
Stadt	Virtual, Online
Land	China

Externe IDs

Scopus	85109891275
ORCID	/0000-0001-8107-2775/work/142253440
ORCID	/0000-0002-5985-4348/work/162348853

Forschungsportal der TU Dresden

Pre-trained web table embeddings for table discovery

Beitragende

Abstract

Details

Konferenz

Externe IDs

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter