Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Lucas Woltmann; Claudio Hartmann; Dirk Habich; Wolfgang Lehner

doi:10.48550/arXiv.2005.09367

Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Research output: Preprint/Documentation/Report › Preprint

Contributors

Lucas Woltmann - , Chair of Databases (Author)
Claudio Hartmann - , Chair of Databases (Author)
Dirk Habich - , Chair of Databases (Author)
Wolfgang Lehner - , Chair of Databases (Author)

Abstract

Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 63 with our aggregate-based training phase and thus outperform indexes.

Details

Original language	English
Number of pages	10
Publication status	Published - 19 May 2020

No renderer: customAssociatesEventsRenderPortal,dk.atira.pure.api.shared.model.researchoutput.WorkingPaper

External IDs

ORCID	/0000-0003-0720-8878/work/142659640
ORCID	/0000-0001-8107-2775/work/142660531

Keywords

cs.DB

Research Portal of the TU Dresden

Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Contributors

Abstract

Details

External IDs

Keywords

Keywords

Related content

Aggregate-based Training Phase for ML-based Cardinality Estimation.