Resilient store: A heuristic-based data format selector for intermediate results

Rana Faisal Munir; Oscar Romero; Alberto Abelló; Besim Bilalli; Maik Thiele; Wolfgang Lehner

doi:10.1007/978-3-319-45547-1_4

Resilient store: A heuristic-based data format selector for intermediate results

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Rana Faisal Munir - , UPC Universitat Politècnica de Catalunya (Barcelona Tech) (Autor:in)
Oscar Romero - , UPC Universitat Politècnica de Catalunya (Barcelona Tech) (Autor:in)
Alberto Abelló - , UPC Universitat Politècnica de Catalunya (Barcelona Tech) (Autor:in)
Besim Bilalli - , UPC Universitat Politècnica de Catalunya (Barcelona Tech) (Autor:in)
Maik Thiele - , Technische Universität Dresden (Autor:in)
Wolfgang Lehner - , Technische Universität Dresden (Autor:in)

Abstract

Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns.We have implemented ResilientStore for HDFS and three different data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.

Details

Originalsprache	Englisch
Titel	Model and Data Engineering
Redakteure/-innen	Óscar Pastor, Jesús M. Almendros Jiménez, Yamine Aït-Ameur, Ladjel Bellatreche
Herausgeber (Verlag)	Springer Verlag
Seiten	42-56
Seitenumfang	15
ISBN (Print)	9783319455464
Publikationsstatus	Veröffentlicht - 2016
Peer-Review-Status	Ja
Extern publiziert	Ja

Publikationsreihe

Reihe	Lecture Notes in Computer Science, Volume 9893
ISSN	0302-9743

Konferenz

Titel	6th International Conference on Model and Data Engineering, MEDI 2016
Dauer	21 - 23 September 2016
Stadt	Almeria
Land	Spanien

Externe IDs

ORCID	/0000-0001-8107-2775/work/142253538

Forschungsportal der TU Dresden

Resilient store: A heuristic-based data format selector for intermediate results

Beitragende

Abstract

Details

Publikationsreihe

Konferenz

Externe IDs

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Bibliotheksschlagworte