Fast approximated nearest neighbor joins for relational database systems

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

K nearest neighbor search (kNN-Search) is a universal data processing technique and a fundamental operation for word embeddings trained by word2vec or related approaches. The benefits of operations on dense vectors like word embeddings for analytical functionalities of RDBMSs motivate an integration of kNN-Joins. However, kNN-Search, as well as kNN-Joins, have barely been integrated into relational database systems so far. In this paper, we develop an index structure for approximated kNN-Joins working well on high-dimensional data and provide an integration into PostgreSQL. The novel index structure is efficient for different cardinalities of the involved join partners. An evaluation of the system based on applications on word embeddings shows the benefits of such an integrated kNN-Join operation and the performance of the proposed approach.

Details

OriginalspracheEnglisch
TitelDatenbanksysteme fur Business, Technologie und Web, BTW 2019 and 18. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", DBIS 2019
Redakteure/-innenTorsten Grust, Felix Naumann, Alexander Bohm, Wolfgang Lehner, Theo Harder, Erhard Rahm, Andreas Heuer, Meike Klettke, Holger Meyer
Herausgeber (Verlag)Gesellschaft fur Informatik (GI)
Seiten225-244
Seitenumfang20
ISBN (elektronisch)9783885796831
PublikationsstatusVeröffentlicht - 2019
Peer-Review-StatusJa

Publikationsreihe

ReiheGI-Edition : lecture notes in informatics. Proceedings
BandP-289
ISSN1617-5468

Konferenz

TitelDatenbanksysteme fur Business, Technologie und Web, BTW 2019 and 18. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", DBIS 2019 - Database Systems for Business, Technology and Web, BTW 2019 and 18th Symposium of the GI Department "Databases and Information Systems", DBIS 2019
Dauer4 - 8 März 2019
StadtRostock
LandDeutschland

Externe IDs

Scopus 85072108736
ORCID /0000-0001-8107-2775/work/142253468

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

  • Approximated nearest neighbor search, Product quantization, RDBMS, Word embeddings