Sample footprints für Data-Warehouse-Datenbanken

Philipp Rösch; Wolfgang Lehner

doi:10.1007/s00450-009-0100-x

Sample footprints für Data-Warehouse-Datenbanken

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Philipp Rösch - , Technische Universität Dresden (Autor:in)
Wolfgang Lehner - , Professur für Datenbanken (Autor:in)

Abstract

With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. In this scenario, Linked Bernoulli Synopses provide memory-efficient schema-level synopses, i. e., synopses that consist of random samples of each table in the schema with minimal overhead for retaining foreign-key integrity within the synopsis. This provides efficient support to the approximate answering of queries with arbitrary foreign-key joins. In this article, we focus on the application of Linked Bernoulli Synopses in data warehouse environments. On the one hand, we analyze the instantiation of memory-bounded synopses. Among others, we address the following questions: How can the given space be partitioned among the individual samples? What is the impact on the overhead? On the other hand, we consider further adaptations of Linked Bernoulli Synopses for usage in data warehouse databases. We show how synopses can incrementally be kept up-todate when the underlying data changes. Further, we suggest additional outlier handling methods to reduce the estimation error of approximate answers of aggregation queries with foreign-key joins. With a variety of experiments, we show that Linked Bernoulli Synopses and the proposed tech

Details

Originalsprache	Deutsch
Seiten (von - bis)	217-233
Seitenumfang	17
Fachzeitschrift	Computer Science - Research and Development
Jahrgang	25
Ausgabenummer	3-4
Publikationsstatus	Veröffentlicht - Sept. 2010
Peer-Review-Status	Ja

Externe IDs

ORCID	/0000-0001-8107-2775/work/199961310

Schlagworte

Schlagwörter

Approximate query processing, Bernoulli sampling, Data warehouse databases, Outlier-aware sample Synopses

Forschungsportal der TU Dresden

Beitragende

Abstract

Details

Externe IDs

Schlagworte

Forschungsprofillinien der TU Dresden

DFG-Fachsystematik nach Fachkollegium

Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis

ASJC Scopus Sachgebiete

Schlagwörter