Learning from Textual Data in Database Systems
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Relational database systems hold massive amounts of text, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, pre-trained word embeddings are increasingly utilized to convert text values into meaningful numbers. However, a naïve one-to-one mapping of each word in a database to a word embedding vector misses incorporating rich context information given by the database schema. Thus, we propose a novel relational retrofitting framework Retro to learn numerical representations of text values in databases, capturing the rich information encoded by pre-trained word embedding models as well as context information provided by tabular and foreign key relations in the database. We defined relation retrofitting as an optimization problem, present an efficient algorithm solving it, and investigate the influence of various hyperparameters. Further, we develop simple feed-forward and complex graph convolutional neural network architectures to operate on those representations. Our evaluation shows that the proposed embeddings and models are ready-to-use for many ML tasks, such as text classification, imputation, and link prediction, and even outperform state-of-the-art techniques.
Details
Originalsprache | Englisch |
---|---|
Titel | CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management |
Herausgeber (Verlag) | Association for Computing Machinery (ACM), New York |
Seiten | 375-384 |
Seitenumfang | 10 |
Band | 2020 |
ISBN (elektronisch) | 978-1-4503-6859-9 |
Publikationsstatus | Veröffentlicht - 19 Okt. 2020 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | CIKM: Conference on Information and Knowledge Management |
---|
Konferenz
Titel | 29th ACM International Conference on Information and Knowledge Management |
---|---|
Kurztitel | CIKM 2020 |
Dauer | 19 - 23 Oktober 2020 |
Stadt | Virtual, Online |
Land | Irland |
Externe IDs
Scopus | 85095865181 |
---|---|
ORCID | /0000-0001-8107-2775/work/142253587 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- relational database, retrofitting, word embedding