A C++ Library for Memory Layout and Performance Portability of Scientific Applications
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
We present a C++14 library for performance portability of scientific computing codes across CPU and GPU architectures. Our library combines generic data structures like vectors, multi-dimensional arrays, maps, graphs, and sparse grids with basic, reusable algorithms like convolutions, sorting, prefix sum, reductions, and scan. The memory layout of the data structures is adapted at compile-time using tuples with optional memory mirroring between CPU and GPU. We combine this transparent memory mapping with generic algorithms under two alternative programming interfaces: a CUDA-like kernel interface for multi-core CPUs, Nvidia GPUs, and AMD GPUs, as well as a lambda interface. We validate and benchmark the presented library using micro-benchmarks, showing that the abstractions introduce negligible performance overhead, and we compare performance against the current state of the art.
Details
Original language | English |
---|---|
Title of host publication | Euro-Par 2022 |
Editors | Jeremy Singer, Yehia Elkhatib, Dora Blanco Heras, Patrick Diehl, Nick Brown, Aleksandar Ilic |
Publisher | Springer Science and Business Media B.V. |
Pages | 109-120 |
Number of pages | 12 |
ISBN (print) | 9783031312083 |
Publication status | Published - 2023 |
Peer-reviewed | Yes |
Publication series
Series | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13835 LNCS |
ISSN | 0302-9743 |
Conference
Title | 28th International European Conference on Parallel and Distributed Computing , Euro-Par 2022 |
---|---|
Duration | 22 - 26 August 2022 |
City | Glasgow |
Country | United Kingdom |
External IDs
ORCID | /0000-0003-4414-4340/work/159608271 |
---|
Keywords
ASJC Scopus subject areas
Keywords
- C++ tuples, generic algorithms, GPU, memory layout, multi-core, performance portability