Automatically Configuring Parallelism for Hybrid Layouts

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

  • Rana Faisal Munir - , Professur für Datenbanken, UPC Universitat Politècnica de Catalunya (Barcelona Tech) (Autor:in)
  • Alberto Abelló - , UPC Universitat Politècnica de Catalunya (Barcelona Tech) (Autor:in)
  • Oscar Romero - , UPC Universitat Politècnica de Catalunya (Barcelona Tech) (Autor:in)
  • Maik Thiele - , Professur für Datenbanken (Autor:in)
  • Wolfgang Lehner - , Professur für Datenbanken (Autor:in)

Abstract

Distributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. However, this can lead to launch more tasks than needed in the case of hybrid layouts, because they help to read less data for certain operations (i.e., projection, selection). The over-provisioning of tasks may increase the job execution time and induce significant waste of computing resources. The latter due to the fact that each task introduces extra overhead (e.g., initialization, garbage collection, etc.). To allow a more efficient use of resources and reduce the job execution time, we propose a cost-based approach that decides the number of tasks based on the data being read. The proposed cost-model can be utilized in a multi-objective approach to decide both the number of tasks and number of machines for execution.

Details

OriginalspracheEnglisch
TitelNew Trends in Databases and Information Systems - ADBIS 2019 Short Papers, Workshops BBIGAP, QAUCA, SemBDM, SIMPDA, M2P, MADEISD, and Doctoral Consortium 2019, Proceedings
Redakteure/-innenTatjana Welzer, Vili Podgorelec, Aida Kamišalic Latific, Johann Eder, Robert Wrembel, Mikolaj Morzy, Mirjana Ivanovic, Johann Gamper, Theodoros Tzouramanis, Jérôme Darmont
Herausgeber (Verlag)Springer Verlag
Seiten120-125
Seitenumfang6
ISBN (Print)9783030302771
PublikationsstatusVeröffentlicht - 2019
Peer-Review-StatusJa

Publikationsreihe

ReiheCommunications in Computer and Information Science
Band1064
ISSN1865-0929

Konferenz

Titel23rd European Conference on Advances in Databases and Information Systems
KurztitelADBIS 2019
Veranstaltungsnummer23
Dauer8 - 11 September 2019
OrtHotel Park
StadtBled
LandSlowenien

Externe IDs

Scopus 85072990842
ORCID /0000-0001-8107-2775/work/142253463

Schlagworte

Schlagwörter

  • Big data, Hybrid storage layouts, Parallelism, Parquet, Spark