Cardinality estimation in ETL processes

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

The cardinality estimation in ETL processes is particularly difficult. Aside from the well-known SQL operators, which are also used in ETL processes, there are a variety of operators without exact counterparts in the relational world. In addition to those, we find operators that support very specific data integration aspects. For such operators, there are no well-examined statistic approaches for cardinality estimations. Therefore, we propose a black-box approach and estimate the cardinality using a set of statistic models for each operator. We discuss different model granularities and develop an adaptive cardinality estimation framework for ETL processes. We map the abstract model operators to specific statistic learning approaches (regression, decision trees, support vector machines, etc.) and evaluate our cardinality estimations in an extensive experimental study.

Details

OriginalspracheEnglisch
TitelDOLAP '09: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
Seiten57-64
Seitenumfang8
PublikationsstatusVeröffentlicht - 2009
Peer-Review-StatusJa

Konferenz

Titel12th ACM International Workshop on Data Warehousing and OLAP, DOLAP'09, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009
Dauer2 - 6 November 2009
StadtHong Kong
LandChina

Externe IDs

ORCID /0000-0001-8107-2775/work/200630402

Schlagworte

Forschungsprofillinien der TU Dresden

Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis

Schlagwörter

  • Cardinality estimation, ETL, Real-time data warehouse