Sample synopses for approximate answering of group-by queries
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. Typically, those analytical queries partition the data into groups and aggregate the values within the groups. Further, with the commonly used roll-up and drill-down operations a broad range of group-by queries is posed to the system, which makes the construction of highly-specialized synopses difficult. In this paper, we propose a general-purpose sampling scheme that is biased in order to answer group-by queries with high accuracy. While existing techniques focus on the size of the group when computing its sample size, our technique is based on its standard deviation. The basic idea is that the more homogeneous a group is, the less representatives are required in order to give a good estimate. With an extensive set of experiments, we show that our approach reduces both the estimation error and the construction cost compared to existing techniques.
Details
| Original language | English |
|---|---|
| Title of host publication | EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology |
| Pages | 403-414 |
| Number of pages | 12 |
| Publication status | Published - 2009 |
| Peer-reviewed | Yes |
Conference
| Title | 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT'09 |
|---|---|
| Duration | 24 - 26 March 2009 |
| City | Saint Petersburg |
| Country | Russian Federation |
External IDs
| ORCID | /0000-0001-8107-2775/work/200630386 |
|---|