Sample footprints für Data-Warehouse-Datenbanken

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

Abstract

With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. In this scenario, Linked Bernoulli Synopses provide memory-efficient schema-level synopses, i. e., synopses that consist of random samples of each table in the schema with minimal overhead for retaining foreign-key integrity within the synopsis. This provides efficient support to the approximate answering of queries with arbitrary foreign-key joins. In this article, we focus on the application of Linked Bernoulli Synopses in data warehouse environments. On the one hand, we analyze the instantiation of memory-bounded synopses. Among others, we address the following questions: How can the given space be partitioned among the individual samples? What is the impact on the overhead? On the other hand, we consider further adaptations of Linked Bernoulli Synopses for usage in data warehouse databases. We show how synopses can incrementally be kept up-todate when the underlying data changes. Further, we suggest additional outlier handling methods to reduce the estimation error of approximate answers of aggregation queries with foreign-key joins. With a variety of experiments, we show that Linked Bernoulli Synopses and the proposed tech

Details

OriginalspracheDeutsch
Seiten (von - bis)217-233
Seitenumfang17
FachzeitschriftComputer Science - Research and Development
Jahrgang25
Ausgabenummer3-4
PublikationsstatusVeröffentlicht - Sept. 2010
Peer-Review-StatusJa

Externe IDs

ORCID /0000-0001-8107-2775/work/199961310

Schlagworte

Forschungsprofillinien der TU Dresden

Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis

ASJC Scopus Sachgebiete

Schlagwörter

  • Approximate query processing, Bernoulli sampling, Data warehouse databases, Outlier-aware sample Synopses