Enhanced Featurization of Queries with Mixed Combinations of Predicates for ML-based Cardinality Estimation

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Estimating query result sizes is a critical task in areas like query optimization. For some years now it has been popular to apply machine learning to this problem. However, surprisingly, there has been very little research yet on how to present queries to a machine learning model. Machine learning models do not simply consume SQL strings. Instead, a SQL string is transformed into a numerical representation. This transformation is called query featurization and is defined by a query featurization technique (QFT). This paper is concerned with QFTs for queries with many selection predicates. In particular, we consider queries that contain both predicates over different attributes and multiple predicates per attribute. We identify a desired property of query featurization and present three novel QFTs. To the best of our knowledge, we are the first to featurize queries with mixed combinations of predicates, i.e., containing both conjunctions and disjunctions. Our QFTs are model-independent and can serve as the query featurization layer for different machine learning model types. In our evaluation, we combine our QFTs with three different machine learning models. We demonstrate that the estimation accuracy of machine learning models significantly depends on the QFT used. In addition, we compare our best combination of QFT and machine learning model to various existing cardinality estimators.

Details

OriginalspracheEnglisch
TitelProceedings of the 26th International Conference on Extending Database Technology (EDBT 2023)
Seiten273-284
Seitenumfang12
Band26
Auflage2
PublikationsstatusVeröffentlicht - 28 März 2023
Peer-Review-StatusJa

Externe IDs

Scopus 85150354662
ORCID /0000-0001-8107-2775/work/142253560

Schlagworte

Forschungsprofillinien der TU Dresden

Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis