Sample footprints für Data-Warehouse-Datenbanken

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Philipp Rösch - , TUD Dresden University of Technology (Author)
  • Wolfgang Lehner - , Chair of Databases (Author)

Abstract

With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. In this scenario, Linked Bernoulli Synopses provide memory-efficient schema-level synopses, i. e., synopses that consist of random samples of each table in the schema with minimal overhead for retaining foreign-key integrity within the synopsis. This provides efficient support to the approximate answering of queries with arbitrary foreign-key joins. In this article, we focus on the application of Linked Bernoulli Synopses in data warehouse environments. On the one hand, we analyze the instantiation of memory-bounded synopses. Among others, we address the following questions: How can the given space be partitioned among the individual samples? What is the impact on the overhead? On the other hand, we consider further adaptations of Linked Bernoulli Synopses for usage in data warehouse databases. We show how synopses can incrementally be kept up-todate when the underlying data changes. Further, we suggest additional outlier handling methods to reduce the estimation error of approximate answers of aggregation queries with foreign-key joins. With a variety of experiments, we show that Linked Bernoulli Synopses and the proposed tech

Translated title of the contribution
Sample footprints for data warehouse databases

Details

Original languageGerman
Pages (from-to)217-233
Number of pages17
JournalComputer Science - Research and Development
Volume25
Issue number3-4
Publication statusPublished - Sept 2010
Peer-reviewedYes

External IDs

ORCID /0000-0001-8107-2775/work/199961310

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Subject groups, research areas, subject areas according to Destatis

ASJC Scopus subject areas

Keywords

  • Approximate query processing, Bernoulli sampling, Data warehouse databases, Outlier-aware sample Synopses