Designing random sample synopses with outliers

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

  • Philipp Rösch - , TUD Dresden University of Technology (Author)
  • Rainer Gemulla - , TUD Dresden University of Technology (Author)
  • Wolfgang Lehner - , Chair of Databases (Author)

Abstract

Random sampling is one of the most widely used means to build synopses of large datasets because random samples can be used for a wide range of analytical tasks. Unfortunately, the quality of the estimates derived from a sample is negatively affected by the presence of "outliers" in the data. In this paper, we show how to circumvent this shortcoming by constructing outlier-aware sample synapses. Our approach extends the well-known outlier indexing scheme to multiple aggregation columns.

Details

Original languageEnglish
Title of host publicationProceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Pages1400-1402
Number of pages3
Publication statusPublished - 2008
Peer-reviewedYes

Publication series

Series International Conference on Data Engineering (ICDE)
ISSN1063-6382

Conference

Title2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Duration7 - 12 April 2008
CityCancun
CountryMexico

External IDs

Scopus 52649105862
ORCID /0000-0001-8107-2775/work/199215652

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Subject groups, research areas, subject areas according to Destatis