A C++ Library for Memory Layout and Performance Portability of Scientific Applications

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Contributors

Abstract

We present a C++14 library for performance portability of scientific computing codes across CPU and GPU architectures. Our library combines generic data structures like vectors, multi-dimensional arrays, maps, graphs, and sparse grids with basic, reusable algorithms like convolutions, sorting, prefix sum, reductions, and scan. The memory layout of the data structures is adapted at compile-time using tuples with optional memory mirroring between CPU and GPU. We combine this transparent memory mapping with generic algorithms under two alternative programming interfaces: a CUDA-like kernel interface for multi-core CPUs, Nvidia GPUs, and AMD GPUs, as well as a lambda interface. We validate and benchmark the presented library using micro-benchmarks, showing that the abstractions introduce negligible performance overhead, and we compare performance against the current state of the art.

Details

Original languageEnglish
Title of host publicationEuro-Par 2022
EditorsJeremy Singer, Yehia Elkhatib, Dora Blanco Heras, Patrick Diehl, Nick Brown, Aleksandar Ilic
PublisherSpringer Science and Business Media B.V.
Pages109-120
Number of pages12
ISBN (print)9783031312083
Publication statusPublished - 2023
Peer-reviewedYes

Publication series

SeriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13835 LNCS
ISSN0302-9743

Conference

Title28th International European Conference on Parallel and Distributed Computing , Euro-Par 2022
Duration22 - 26 August 2022
CityGlasgow
CountryUnited Kingdom

External IDs

ORCID /0000-0003-4414-4340/work/159608271

Keywords

Keywords

  • C++ tuples, generic algorithms, GPU, memory layout, multi-core, performance portability