Aggregate-based Training Phase for ML-based Cardinality Estimation.

Lucas Woltmann; Claudio Hartmann; Dirk Habich; Wolfgang Lehner

doi:10.1007/S13222-021-00400-Z

Aggregate-based Training Phase for ML-based Cardinality Estimation.

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Lucas Woltmann - , Professur für Datenbanken (Autor:in)
Claudio Hartmann - , Professur für Datenbanken (Autor:in)
Dirk Habich - , Professur für Datenbanken (Autor:in)
Wolfgang Lehner - , Professur für Datenbanken (Autor:in)

Abstract

Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our aggregate-based training phase and thus outperform indexes.

Details

Originalsprache	Englisch
Seiten (von - bis)	45-57
Seitenumfang	13
Fachzeitschrift	Datenbank-Spektrum : Zeitschrift für Datenbanktechnologie und Information Retrieval
Jahrgang	22
Ausgabenummer	1
Publikationsstatus	Veröffentlicht - 2022
Peer-Review-Status	Ja

Externe IDs

Mendeley	3a2bc1b7-fd21-389d-ad9a-602cc2e02d81
ORCID	/0000-0003-0720-8878/work/141545679
ORCID	/0000-0001-8107-2775/work/142253433

Schlagworte

Bibliotheksschlagworte

004 Informatik

Forschungsportal der TU Dresden

Aggregate-based Training Phase for ML-based Cardinality Estimation.

Beitragende

Abstract

Details

Externe IDs

Schlagworte

Bibliotheksschlagworte

Verknüpfte Inhalte

Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data