ATUN-HL: Auto Tuning of Hybrid Layouts Using Workload and Data Characteristics

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

  • Rana Faisal Munir - , Chair of Databases, UPC Polytechnic University of Catalonia (Barcelona Tech) (Author)
  • Alberto Abelló - , UPC Polytechnic University of Catalonia (Barcelona Tech) (Author)
  • Oscar Romero - , UPC Polytechnic University of Catalonia (Barcelona Tech) (Author)
  • Maik Thiele - , Chair of Databases (Author)
  • Wolfgang Lehner - , Chair of Databases (Author)

Abstract

Ad-hoc analysis implies processing data in near real-time. Thus, raw data (i.e., neither normalized nor transformed) is typically dumped into a distributed engine, where it is generally stored into a hybrid layout. Hybrid layouts divide data into horizontal partitions and inside each partition, data are stored vertically. They keep statistics for each horizontal partition and also support encoding (i.e., dictionary) and compression to reduce the size of the data. Their built-in support for many ad-hoc operations (i.e., selection, projection, aggregation, etc.) makes hybrid layouts the best choice for most operations. Horizontal partition and dictionary sizes of hybrid layouts are configurable and can directly impact the performance of analytical queries. Hence, their default configuration cannot be expected to be optimal for all scenarios. In this paper, we present ATUN-HL (Auto TUNing Hybrid Layouts), which based on a cost model and given the workload and the characteristics of data, finds the best values for these parameters. We prototyped ATUN-HL for Apache Parquet, which is an open source implementation of hybrid layouts in Hadoop Distributed File System, to show its effectiveness. Our experimental evaluation shows that ATUN-HL provides on average 85% of all the potential performance improvement, and 1.2x average speedup against default configuration.

Details

Original languageEnglish
Title of host publicationAdvances in Databases and Information Systems - 22nd European Conference, ADBIS 2018, Proceedings
EditorsAndras Benczur, Tomas Horvath, Bernhard Thalheim
PublisherSpringer, Berlin [u. a.]
Pages200-215
Number of pages16
ISBN (print)9783319983974
Publication statusPublished - 2018
Peer-reviewedYes

Publication series

SeriesLecture Notes in Computer Science, Volume 11019
ISSN0302-9743

Conference

Title22nd East-European Conference on Advances in Databases and Information Systems, ADBIS 2018
Duration2 - 5 September 2018
CityBudapest
CountryHungary

External IDs

Scopus 85051071595
ORCID /0000-0001-8107-2775/work/142253506

Keywords

Keywords

  • Auto tuning, Big data, Hybrid storage layouts, Parquet

Library keywords