Data compression has recently experienced a revival in the domain of in-memory column stores. In this field, a large corpus of lightweight integer compression algorithms plays a dominant role since all columns are typically encoded as sequences of integer values. Unfortunately, there is no single-best integer compression algorithm and the best algorithm depends on data and hardware properties. For this reason, selecting the best-fitting integer compression algorithm becomes more important and is an interesting tuning knob for optimization. However, traditional selection strategies require a profound knowledge of the (de-)compression algorithms for decision-making. This limits the broad applicability of the selection strategies. To counteract this, we propose a novel learned selection strategy by considering integer compression algorithms as independent black boxes. This black-box approach ensures broad applicability and requires machine learning-based methods to model the required knowledge for decision-making. Most importantly, we show that a local approach, where every algorithm is modeled individually, plays a crucial role. Moreover, our learned selection strategy is generalized by user-data-independence. Finally, we evaluate our approach and compare our approach against existing selection strategies to show the benefits of our learned selection strategy.
|26th International Conference on Extending Database Technology (EDBT 2023)
|Veröffentlicht - 28 März 2023