Enhanced Featurization of Queries with Mixed Combinations of Predicates for ML-based Cardinality Estimation

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review



Estimating query result sizes is a critical task in areas like query optimization. For some years now it has been popular to apply machine learning to this problem. However, surprisingly, there has been very little research yet on how to present queries to a machine learning model. Machine learning models do not simply consume SQL strings. Instead, a SQL string is transformed into a numerical representation. This transformation is called query featurization and is defined by a query featurization technique (QFT). This paper is concerned with QFTs for queries with many selection predicates. In particular, we consider queries that contain both predicates over different attributes and multiple predicates per attribute. We identify a desired property of query featurization and present three novel QFTs. To the best of our knowledge, we are the first to featurize queries with mixed combinations of predicates, i.e., containing both conjunctions and disjunctions. Our QFTs are model-independent and can serve as the query featurization layer for different machine learning model types. In our evaluation, we combine our QFTs with three different machine learning models. We demonstrate that the estimation accuracy of machine learning models significantly depends on the QFT used. In addition, we compare our best combination of QFT and machine learning model to various existing cardinality estimators.


Original languageEnglish
Title of host publicationProceedings of the 26th International Conference on Extending Database Technology (EDBT 2023)
Number of pages12
Publication statusPublished - 28 Mar 2023

External IDs

Scopus 85150354662
ORCID /0000-0001-8107-2775/work/142253560


Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Subject groups, research areas, subject areas according to Destatis