DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Publikation: Beitrag zu KonferenzenPaperBeigetragenBegutachtung

Beitragende

  • Patrick Damme - , Technische Universität Graz (Autor:in)
  • Marius Birkenbach - , KAI GmbH (Autor:in)
  • Constantinos Bitsakos - , National Technical University of Athens (Autor:in)
  • Matthias Boehm - , Technische Universität Graz (Autor:in)
  • Philippe Bonnet - , IT University of Copenhagen (Autor:in)
  • Florina Ciorba - , Universität Basel (Autor:in)
  • Mark Dokter - , Technische Universität Graz (Autor:in)
  • Pawel Dowgiallo - , Intel Corporation (Autor:in)
  • Ahmed Eleliemy - , Universität Basel (Autor:in)
  • Christian Faerber - , Intel Corporation (Autor:in)
  • Georgios Goumas - , National Technical University of Athens (Autor:in)
  • Dirk Habich - , Professur für Datenbanken (Autor:in)
  • Niclas Hedam - , IT University of Copenhagen (Autor:in)
  • Marlies Hofer - , AVL List GmbH (Autor:in)
  • Wenjun Huang - , Deutsches Zentrum fur Luft- Und Raumfahrt e.V. (DLR) - Stand­ort Je­na (Autor:in)
  • Kevin Innerebner - , Technische Universität Graz (Autor:in)
  • Vasileios Karakostas - , National Technical University of Athens (Autor:in)
  • Roman Kern - , Technische Universität Graz (Autor:in)
  • Tomaž Kosar - , University of Maribor (Autor:in)
  • Alexander Krause - , Professur für Datenbanken (Autor:in)
  • Daniel Krems - , AVL List GmbH (Autor:in)
  • Andreas Laber - , Infineon Technologies AG (Autor:in)
  • Wolfgang Lehner - , Professur für Datenbanken (Autor:in)
  • Eric Mier - , Professur für Datenbanken (Autor:in)
  • Marcus Paradies - , Deutsches Zentrum fur Luft- Und Raumfahrt e.V. (DLR) - Stand­ort Je­na (Autor:in)
  • Bernhard Peischl - , AVL List GmbH (Autor:in)
  • Gabrielle Poerwawinata - , Universität Basel (Autor:in)
  • Stratos Psomadakis - , National Technical University of Athens (Autor:in)
  • Tilmann Rabl - , Universität Potsdam (Autor:in)
  • Piotr Ratuszniak - , Intel Corporation (Autor:in)
  • Pedro Silva - , Universität Potsdam (Autor:in)
  • Nikolai Skuppin - , Technische Universität München (Autor:in)
  • Andreas Starzacher - , Infineon Technologies AG (Autor:in)
  • Benjamin Steinwender - , KAI GmbH (Autor:in)
  • Ilin Tolovski - , Universität Potsdam (Autor:in)
  • Pınar Tözün - , IT University of Copenhagen (Autor:in)
  • Wojciech Ulatowski - , Intel Corporation (Autor:in)
  • Yuanyuan Wang - , Technische Universität München (Autor:in)
  • Izajasz Wrosz - , Intel Corporation (Autor:in)
  • Aleš Zamuda - , University of Maribor (Autor:in)
  • Ce Zhang - , ETH Zürich (Autor:in)
  • Xiao Xiang Zhu - , Technische Universität München (Autor:in)

Abstract

Integrated data analysis (IDA) pipelines-that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring-become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used-increasingly heterogeneous-hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results.

Details

OriginalspracheEnglisch
PublikationsstatusVeröffentlicht - Jan. 2022
Peer-Review-StatusJa

Konferenz

Titel12th Annual Conference on Innovative Data Systems Research
KurztitelCIDR 2022
Veranstaltungsnummer12
Dauer9 - 12 Januar 2022
Webseite
OrtChaminade Resort and Spa & Online
StadtSanta Cruz
LandUSA/Vereinigte Staaten

Externe IDs

ORCID /0000-0001-8107-2775/work/176861684