Integrating Lightweight Compression Capabilities into Apache Arrow
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
With the ongoing shift to a data-driven world in almost all application domains, the management and in particular the analytics of large amounts of data gain in importance. For that reason, a variety of new big data systems has been developed in recent years. Aside from that, a revision of the data organization and formats has been initiated as a foundation for these big data systems. In this context, Apache Arrow is a novel cross-language development platform for in-memory data with a standardized language-independent columnar memory format. The data is organized for efficient analytic operations on modern hardware, whereby Apache Arrow only supports dictionary encoding as a specific compression approach. However, there exists a large corpus of lightweight compression algorithms for columnar data which helps to reduce the necessary memory space as well as to increase the processing performance. Thus, we present a flexible and language-independent approach integrating lightweight compression algorithms into the Apache Arrow framework in this paper. With our so-called ArrowComp approach, we preserve the unique properties of Apache Arrow, but enhance the platform with a large variety of lightweight compression capabilities.
Details
Originalsprache | Englisch |
---|---|
Titel | DATA 2020 - Proceedings of the 9th International Conference on Data Science, Technology and Applications |
Redakteure/-innen | Slimane Hammoudi, Christoph Quix, Jorge Bernardino |
Herausgeber (Verlag) | SCITEPRESS - Science and Technology Publications |
Seiten | 55-66 |
Seitenumfang | 12 |
ISBN (elektronisch) | 9789897584404 |
Publikationsstatus | Veröffentlicht - 2020 |
Peer-Review-Status | Ja |
Konferenz
Titel | 9th International Conference on Data Science, Technology and Applications, DATA 2020 |
---|---|
Dauer | 7 - 9 Juli 2020 |
Stadt | Virtual, Online |
Land | Frankreich |
Externe IDs
dblp | conf/data/HildebrandtHL20 |
---|---|
Scopus | 85091968887 |
ORCID | /0000-0001-8107-2775/work/142253553 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- Apache arrow, Columnar data, Data formats, Integration, Lightweight compression