FacetE: Exploiting web tables for domain-specific word embedding evaluation
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Today's natural language processing and information retrieval systems heavily depend on word embedding techniques to represent text values. However, given a specific task deciding for a word embedding dataset is not trivial. Current word embedding evaluation methods mostly provide only a one-dimensional quality measure, which does not express how knowledge from different domains is represented in the word embedding models. To overcome this limitation, we provide a new evaluation data set called FacetE derived from 125M Web tables, enabling domain-sensitive evaluation. We show that FacetE can effectively be used to evaluate word embedding models. The evaluation of common general-purpose word embedding models suggests that there is currently no best word embedding for every domain.
Details
| Original language | English |
|---|---|
| Title of host publication | DBTest '20: Proceedings of the workshop on Testing Database Systems |
| Publisher | Association for Computing Machinery (ACM), New York |
| Pages | 1-6 |
| Number of pages | 6 |
| ISBN (electronic) | 978-1-4503-8001-0 |
| Publication status | Published - 19 Jun 2020 |
| Peer-reviewed | Yes |
Publication series
| Series | MOD: International Conference on Management of Data (DBTest) |
|---|
Workshop
| Title | 8th Workshop on Testing Database Systems |
|---|---|
| Abbreviated title | DBTest 2020 |
| Conference number | 8 |
| Description | held at ACM SIGMOD/PODS 2020 |
| Duration | 19 June 2020 |
| Website | |
| Location | Online |
| City | Portland |
| Country | United States of America |
External IDs
| Scopus | 85086066357 |
|---|---|
| ORCID | /0000-0001-8107-2775/work/142253453 |