Deferred maintenance of disk-based random samples
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on secondary storage. We show that the sample maintenance cost can be reduced by refreshing the sample in a deferred manner. We introduce a novel type of log file which follows the intuition that only a "sample" of the operations on the base data has to be considered to maintain a random sample in a statistically correct way. Additionally, we develop a deferred refresh algorithm which updates the sample by using fast sequential disk access only, and which does not require any main memory. We conducted an extensive set of experiments and found, that our algorithms reduce maintenance cost by several orders of magnitude.
Details
| Originalsprache | Englisch |
|---|---|
| Titel | Advances in Database Technology - EDBT 2006 - 10th International Conference on Extending Database Technology, Proceedings |
| Seiten | 423-441 |
| Seitenumfang | 19 |
| Publikationsstatus | Veröffentlicht - 2006 |
| Peer-Review-Status | Ja |
Publikationsreihe
| Reihe | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Band | 3896 LNCS |
| ISSN | 0302-9743 |
Konferenz
| Titel | 10th International Conference on Extending Database Technology, EDBT 2006 |
|---|---|
| Dauer | 26 - 31 März 2006 |
| Stadt | Munich |
| Land | Deutschland |
Externe IDs
| ORCID | /0000-0001-8107-2775/work/200630407 |
|---|