Cache-efficient aggregation: Hashing is sorting

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

  • Ingo Müller - , Karlsruher Institut für Technologie, SAP Research (Autor:in)
  • Peter Sanders - , Karlsruher Institut für Technologie (Autor:in)
  • Arnaud Lacurie - , SAP Research (Autor:in)
  • Wolfgang Lehner - , Professur für Datenbanken (Autor:in)
  • Franz Färber - , Karlsruher Institut für Technologie (Autor:in)

Abstract

For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically implemented algorithms, and the data sets used in the experiments, different authors came to different conclusions about which is the better approach. In this paper we argue that in terms of cache efficiency, the two paradigms are actually the same. We support our claim by showing that the complexity of hashing is the same as the complexity of sorting in the external memory model. Furthermore we make the similarity of the two approaches obvious by designing an algorithmic framework that allows to switch seamlessly between hashing and sorting during execution. The fact that we mix hashing and sorting routines in the same algorithmic framework allows us to leverage the advantages of both approaches and makes their similarity obvious. On a more practical note, we also show how to achieve very low constant factors by tuning both the hashing and the sorting routines to modern hardware. Since we observe a complementary dependency of the constant factors of the two routines to the locality of the input, we exploit our framework to switch to the faster routine where appropriate. The result is a novel relational aggregation algorithm that is cache-efficient-independently and without prior knowledge of input skew and output cardinality-, highly parallelizable on modern multi-core systems, and operating at a speed close to the memory bandwidth, thus outperforming the state-of-the-art by up to 3:7×.

Details

OriginalspracheEnglisch
TitelSIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
Herausgeber (Verlag)Association for Computing Machinery
Seiten1123-1136
Seitenumfang14
ISBN (elektronisch)9781450327589
PublikationsstatusVeröffentlicht - 27 Mai 2015
Peer-Review-StatusJa

Konferenz

TitelACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Dauer31 Mai - 4 Juni 2015
StadtMelbourne
LandAustralien

Externe IDs

ORCID /0000-0001-8107-2775/work/198592324

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

  • "Group by", Adaptive algorithm, Aggregation, Cache-efficient, Grouping, Hashing, Robust performance, Shared-memory, Sorting