De2Dup: Extended Deduplication for Multi-Tenant Databases

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

Content-based page sharing (de-duplication) is a heavily used technique to improve memory efficiency in virtualized systems by identifying and merging identical pages. For many years now, the Linux kernel has offered this de-duplication technique via the Kernel Samepage Merging (KSM) feature. Although KSM in general works well, it is not used in multi-tenant database systems even though multiple tenants often manage similar data. One reason is that pages must be binary identical, which is a severe restriction. Secondly, KSM is seemingly scheduled as a single-threaded process by the OS, independently of the database workload, which further limits its applicability for in-memory systems with terabytes of main memory. To overcome that, we propose an extended de-duplication mechanism called De2Dup for memory-centric multi-tenant database engines. De2Dup extends de-duplication with a delta mechanism to significantly boost the application, especially when pages are not binary identical. Moreover, our De2Dup mechanism allows to steer the search for duplicates and has low overhead as we are able to offload the complete execution to a modern on-chip accelerator for memory operations in an asynchronous manner on recent Intel server processors. In addition, De2Dup offers an efficient way for on-the-fly tenant-aware data reconstruction during scan operations.

Details

Original languageEnglish
Title of host publication21st International Workshop on Data Management on New Hardware, DaMoN 2025
PublisherAssociation for Computing Machinery, Inc
Number of pages9
ISBN (electronic)979-8-4007-1940-0
Publication statusPublished - 10 Jul 2025
Peer-reviewedYes

Workshop

Title21st International Workshop on Data Management on New Hardware
Abbreviated titleDaMoN 2025
Conference number21
Duration23 June 2025
Website
LocationIntercontinental Berlin
CityBerlin
CountryGermany

External IDs

ORCID /0000-0001-8107-2775/work/194824064

Keywords

ASJC Scopus subject areas

Keywords

  • Data Access, De-Duplication, Intel DSA, Multi-Tenancy