Deferred maintenance of disk-based random samples

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on secondary storage. We show that the sample maintenance cost can be reduced by refreshing the sample in a deferred manner. We introduce a novel type of log file which follows the intuition that only a "sample" of the operations on the base data has to be considered to maintain a random sample in a statistically correct way. Additionally, we develop a deferred refresh algorithm which updates the sample by using fast sequential disk access only, and which does not require any main memory. We conducted an extensive set of experiments and found, that our algorithms reduce maintenance cost by several orders of magnitude.

Details

OriginalspracheEnglisch
TitelAdvances in Database Technology - EDBT 2006 - 10th International Conference on Extending Database Technology, Proceedings
Seiten423-441
Seitenumfang19
PublikationsstatusVeröffentlicht - 2006
Peer-Review-StatusJa

Publikationsreihe

ReiheLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band3896 LNCS
ISSN0302-9743

Konferenz

Titel10th International Conference on Extending Database Technology, EDBT 2006
Dauer26 - 31 März 2006
StadtMunich
LandDeutschland

Externe IDs

ORCID /0000-0001-8107-2775/work/200630407

Schlagworte

Forschungsprofillinien der TU Dresden

Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis