FacetE: Exploiting web tables for domain-specific word embedding evaluation

Michael Günther; Paul Sikorski; Maik Thiele; Wolfgang Lehner

doi:10.1145/3395032.3395325

FacetE: Exploiting web tables for domain-specific word embedding evaluation

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Michael Günther - , Chair of Databases (Author)
Paul Sikorski - , TUD Dresden University of Technology (Author)
Maik Thiele - , Chair of Databases (Author)
Wolfgang Lehner - , Chair of Databases (Author)

Abstract

Today's natural language processing and information retrieval systems heavily depend on word embedding techniques to represent text values. However, given a specific task deciding for a word embedding dataset is not trivial. Current word embedding evaluation methods mostly provide only a one-dimensional quality measure, which does not express how knowledge from different domains is represented in the word embedding models. To overcome this limitation, we provide a new evaluation data set called FacetE derived from 125M Web tables, enabling domain-sensitive evaluation. We show that FacetE can effectively be used to evaluate word embedding models. The evaluation of common general-purpose word embedding models suggests that there is currently no best word embedding for every domain.

Details

Original language	English
Title of host publication	DBTest '20: Proceedings of the workshop on Testing Database Systems
Publisher	Association for Computing Machinery (ACM), New York
Pages	1-6
Number of pages	6
ISBN (electronic)	978-1-4503-8001-0
Publication status	Published - 19 Jun 2020
Peer-reviewed	Yes

Publication series

Series	MOD: International Conference on Management of Data (DBTest)

Workshop

Title	8th Workshop on Testing Database Systems
Abbreviated title	DBTest 2020
Conference number	8
Description	held at ACM SIGMOD/PODS 2020
Duration	19 June 2020
Website	https://sigmod2020.org/sigmod_workshops.shtml
Location	Online
City	Portland
Country	United States of America

External IDs

Scopus	85086066357
ORCID	/0000-0001-8107-2775/work/142253453

Keywords

ASJC Scopus subject areas

Software
Safety, Risk, Reliability and Quality

Research Portal of the TU Dresden