De2Dup: Extended Deduplication for Multi-Tenant Databases

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Content-based page sharing (de-duplication) is a heavily used technique to improve memory efficiency in virtualized systems by identifying and merging identical pages. For many years now, the Linux kernel has offered this de-duplication technique via the Kernel Samepage Merging (KSM) feature. Although KSM in general works well, it is not used in multi-tenant database systems even though multiple tenants often manage similar data. One reason is that pages must be binary identical, which is a severe restriction. Secondly, KSM is seemingly scheduled as a single-threaded process by the OS, independently of the database workload, which further limits its applicability for in-memory systems with terabytes of main memory. To overcome that, we propose an extended de-duplication mechanism called De2Dup for memory-centric multi-tenant database engines. De2Dup extends de-duplication with a delta mechanism to significantly boost the application, especially when pages are not binary identical. Moreover, our De2Dup mechanism allows to steer the search for duplicates and has low overhead as we are able to offload the complete execution to a modern on-chip accelerator for memory operations in an asynchronous manner on recent Intel server processors. In addition, De2Dup offers an efficient way for on-the-fly tenant-aware data reconstruction during scan operations.

Details

OriginalspracheEnglisch
Titel21st International Workshop on Data Management on New Hardware, DaMoN 2025
Herausgeber (Verlag)Association for Computing Machinery, Inc
Seitenumfang9
ISBN (elektronisch)979-8-4007-1940-0
PublikationsstatusVeröffentlicht - 10 Juli 2025
Peer-Review-StatusJa

Workshop

Titel21st International Workshop on Data Management on New Hardware
KurztitelDaMoN 2025
Veranstaltungsnummer21
Dauer23 Juni 2025
Webseite
OrtIntercontinental Berlin
StadtBerlin
LandDeutschland

Externe IDs

ORCID /0000-0001-8107-2775/work/194824064

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

  • Data Access, De-Duplication, Intel DSA, Multi-Tenancy