DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Patrick Damme; Marius Birkenbach; Constantinos Bitsakos; Matthias Boehm; Philippe Bonnet; Florina Ciorba; Mark Dokter; Pawel Dowgiallo; Ahmed Eleliemy; Christian Faerber; Georgios Goumas; Dirk Habich; Niclas Hedam; Marlies Hofer; Wenjun Huang; Kevin Innerebner; Vasileios Karakostas; Roman Kern; Tomaž Kosar; Alexander Krause; Daniel Krems; Andreas Laber; Wolfgang Lehner; Eric Mier; Marcus Paradies; Bernhard Peischl; Gabrielle Poerwawinata; Stratos Psomadakis; Tilmann Rabl; Piotr Ratuszniak; Pedro Silva; Nikolai Skuppin; Andreas Starzacher; Benjamin Steinwender; Ilin Tolovski; Pınar Tözün; Wojciech Ulatowski; Yuanyuan Wang; Izajasz Wrosz; Aleš Zamuda; Ce Zhang; Xiao Xiang Zhu

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Research output: Contribution to conferences › Paper › Contributed › peer-review

Contributors

Patrick Damme - , Graz University of Technology (Author)
Marius Birkenbach - , KAI GmbH (Author)
Constantinos Bitsakos - , National Technical University of Athens (Author)
Matthias Boehm - , Graz University of Technology (Author)
Philippe Bonnet - , IT University of Copenhagen (Author)
Florina Ciorba - , University of Basel (Author)
Mark Dokter - , Graz University of Technology (Author)
Pawel Dowgiallo - , Intel Corporation (Author)
Ahmed Eleliemy - , University of Basel (Author)
Christian Faerber - , Intel Corporation (Author)
Georgios Goumas - , National Technical University of Athens (Author)
Dirk Habich - , Chair of Databases (Author)
Niclas Hedam - , IT University of Copenhagen (Author)
Marlies Hofer - , AVL List GmbH (Author)
Wenjun Huang - , German Aerospace Center (DLR) - Standort Jena (Author)
Kevin Innerebner - , Graz University of Technology (Author)
Vasileios Karakostas - , National Technical University of Athens (Author)
Roman Kern - , Graz University of Technology (Author)
Tomaž Kosar - , University of Maribor (Author)
Alexander Krause - , Chair of Databases (Author)
Daniel Krems - , AVL List GmbH (Author)
Andreas Laber - , Infineon Technologies AG (Author)
Wolfgang Lehner - , Chair of Databases (Author)
Eric Mier - , Chair of Databases (Author)
Marcus Paradies - , German Aerospace Center (DLR) - Standort Jena (Author)
Bernhard Peischl - , AVL List GmbH (Author)
Gabrielle Poerwawinata - , University of Basel (Author)
Stratos Psomadakis - , National Technical University of Athens (Author)
Tilmann Rabl - , University of Potsdam (Author)
Piotr Ratuszniak - , Intel Corporation (Author)
Pedro Silva - , University of Potsdam (Author)
Nikolai Skuppin - , Technical University of Munich (Author)
Andreas Starzacher - , Infineon Technologies AG (Author)
Benjamin Steinwender - , KAI GmbH (Author)
Ilin Tolovski - , University of Potsdam (Author)
Pınar Tözün - , IT University of Copenhagen (Author)
Wojciech Ulatowski - , Intel Corporation (Author)
Yuanyuan Wang - , Technical University of Munich (Author)
Izajasz Wrosz - , Intel Corporation (Author)
Aleš Zamuda - , University of Maribor (Author)
Ce Zhang - , ETH Zurich (Author)
Xiao Xiang Zhu - , Technical University of Munich (Author)

Abstract

Integrated data analysis (IDA) pipelines-that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring-become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used-increasingly heterogeneous-hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results.

Details

Original language	English
Publication status	Published - Jan 2022
Peer-reviewed	Yes

Conference

Title	12th Annual Conference on Innovative Data Systems Research
Abbreviated title	CIDR 2022
Conference number	12
Duration	9 - 12 January 2022
Website	https://www.cidrdb.org/cidr2022/
Location	Chaminade Resort and Spa & Online
City	Santa Cruz
Country	United States of America

External IDs

ORCID	/0000-0001-8107-2775/work/176861684

Keywords

Research priority areas of TU Dresden

Data-intensive Sciences

Research Portal of the TU Dresden

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Contributors

Abstract

Details

Conference

External IDs

Keywords

Research priority areas of TU Dresden

ASJC Scopus subject areas