Rapidgzip: Parallel Decompression and Seeking in Gzip Files Using Cache Prefetching
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz cannot decompress arbitrary gzip files. It requires the decompressed stream to only contain byte values 9-126. In this work, we present a generalization of the parallelization scheme used by pugz that can be reliably applied to arbitrary gzip-compressed data without compromising performance. We show that the requirements on the file contents posed by pugz can be dropped by implementing an architecture based on a cache and a parallelized prefetcher. This architecture can safely handle faulty decompression results, which can appear when threads start decompressing in the middle of a gzip file by using trial and error. Using 128 cores, our implementation reaches 8.7 GB/s decompression bandwidth for gzip-compressed base64-encoded data, a speedup of 55 over the single-threaded GNU gzip, and 5.6 GB/s for the Silesia corpus, a speedup of 33 over GNU gzip.
Details
Originalsprache | Englisch |
---|---|
Titel | HPDC 2023 - Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing |
Erscheinungsort | Orlando, FL, US |
Herausgeber (Verlag) | Association for Computing Machinery (ACM), New York |
Seiten | 295-307 |
Seitenumfang | 13 |
ISBN (elektronisch) | 9798400701559 |
Publikationsstatus | Veröffentlicht - 19 Juni 2023 |
Peer-Review-Status | Ja |
Externe IDs
Scopus | 85169612617 |
---|
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- decompression, gzip, parallel algorithm, performance, random access