Designing random sample synopses with outliers
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Random sampling is one of the most widely used means to build synopses of large datasets because random samples can be used for a wide range of analytical tasks. Unfortunately, the quality of the estimates derived from a sample is negatively affected by the presence of "outliers" in the data. In this paper, we show how to circumvent this shortcoming by constructing outlier-aware sample synapses. Our approach extends the well-known outlier indexing scheme to multiple aggregation columns.
Details
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08 |
| Pages | 1400-1402 |
| Number of pages | 3 |
| Publication status | Published - 2008 |
| Peer-reviewed | Yes |
Publication series
| Series | International Conference on Data Engineering (ICDE) |
|---|---|
| ISSN | 1063-6382 |
Conference
| Title | 2008 IEEE 24th International Conference on Data Engineering, ICDE'08 |
|---|---|
| Duration | 7 - 12 April 2008 |
| City | Cancun |
| Country | Mexico |
External IDs
| Scopus | 52649105862 |
|---|---|
| ORCID | /0000-0001-8107-2775/work/199215652 |