Sample footprints für Data-Warehouse-Datenbanken
Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung
Beitragende
Abstract
With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. In this scenario, Linked Bernoulli Synopses provide memory-efficient schema-level synopses, i. e., synopses that consist of random samples of each table in the schema with minimal overhead for retaining foreign-key integrity within the synopsis. This provides efficient support to the approximate answering of queries with arbitrary foreign-key joins. In this article, we focus on the application of Linked Bernoulli Synopses in data warehouse environments. On the one hand, we analyze the instantiation of memory-bounded synopses. Among others, we address the following questions: How can the given space be partitioned among the individual samples? What is the impact on the overhead? On the other hand, we consider further adaptations of Linked Bernoulli Synopses for usage in data warehouse databases. We show how synopses can incrementally be kept up-todate when the underlying data changes. Further, we suggest additional outlier handling methods to reduce the estimation error of approximate answers of aggregation queries with foreign-key joins. With a variety of experiments, we show that Linked Bernoulli Synopses and the proposed tech
Details
| Originalsprache | Deutsch |
|---|---|
| Seiten (von - bis) | 217-233 |
| Seitenumfang | 17 |
| Fachzeitschrift | Computer Science - Research and Development |
| Jahrgang | 25 |
| Ausgabenummer | 3-4 |
| Publikationsstatus | Veröffentlicht - Sept. 2010 |
| Peer-Review-Status | Ja |
Externe IDs
| ORCID | /0000-0001-8107-2775/work/199961310 |
|---|
Schlagworte
Forschungsprofillinien der TU Dresden
DFG-Fachsystematik nach Fachkollegium
Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis
ASJC Scopus Sachgebiete
Schlagwörter
- Approximate query processing, Bernoulli sampling, Data warehouse databases, Outlier-aware sample Synopses