FacetE: Exploiting web tables for domain-specific word embedding evaluation
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Today's natural language processing and information retrieval systems heavily depend on word embedding techniques to represent text values. However, given a specific task deciding for a word embedding dataset is not trivial. Current word embedding evaluation methods mostly provide only a one-dimensional quality measure, which does not express how knowledge from different domains is represented in the word embedding models. To overcome this limitation, we provide a new evaluation data set called FacetE derived from 125M Web tables, enabling domain-sensitive evaluation. We show that FacetE can effectively be used to evaluate word embedding models. The evaluation of common general-purpose word embedding models suggests that there is currently no best word embedding for every domain.
Details
Originalsprache | Englisch |
---|---|
Titel | DBTest '20: Proceedings of the workshop on Testing Database Systems |
Herausgeber (Verlag) | Association for Computing Machinery (ACM), New York |
Seiten | 1-6 |
Seitenumfang | 6 |
ISBN (elektronisch) | 978-1-4503-8001-0 |
Publikationsstatus | Veröffentlicht - 19 Juni 2020 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | MOD: International Conference on Management of Data (DBTest) |
---|
Konferenz
Titel | 2020 Workshop on Testing Database Systems, DBTest 2020 |
---|---|
Dauer | 19 Juni 2020 |
Stadt | Portland |
Land | USA/Vereinigte Staaten |
Externe IDs
Scopus | 85086066357 |
---|---|
ORCID | /0000-0001-8107-2775/work/142253453 |