A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

Pietro Incardona; Aryaman Gupta; Serhii Yaskovets; Ivo F. Sbalzarini

doi:10.1002/cpe.7870

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Pietro Incardona - , Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden), Professur für Wissenschaftliches Rechnen für Systembiologie, Max Planck Institute of Molecular Cell Biology and Genetics, Zentrum für Systembiologie Dresden (CSBD) (Autor:in)
Aryaman Gupta - , Professur für Wissenschaftliches Rechnen für Systembiologie, Max Planck Institute of Molecular Cell Biology and Genetics, Zentrum für Systembiologie Dresden (CSBD) (Autor:in)
Serhii Yaskovets - , Professur für Wissenschaftliches Rechnen für Systembiologie, Max Planck Institute of Molecular Cell Biology and Genetics, Zentrum für Systembiologie Dresden (CSBD) (Autor:in)
Ivo F. Sbalzarini - , Professur für Wissenschaftliches Rechnen für Systembiologie, Technische Universität Dresden, Max Planck Institute of Molecular Cell Biology and Genetics, Zentrum für Systembiologie Dresden (CSBD) (Autor:in)

Abstract

We present a C++ library for transparent memory and compute abstraction across CPU and GPU architectures. Our library combines generic data structures like vectors, multi-dimensional arrays, maps, graphs, and sparse grids with basic generic algorithms like arbitrary-dimensional convolutions, copying, merging, sorting, prefix sum, reductions, neighbor search, and filtering. The memory layout of the data structures is adapted at compile time using C++ tuples with optional memory double-mapping between host and device and the capability of using memory managed by external libraries with no data copying. We combine this transparent memory layout with generic thread-parallel algorithms under two alternative common interfaces: a CUDA-like kernel interface and a lambda-function interface. We quantify the memory and compute performance and portability of our implementation using micro-benchmarks, showing that the abstractions introduce negligible performance overhead, and we compare performance against the current state of the art in a real-world scientific application from computational fluid mechanics.

Details

Originalsprache	Englisch
Aufsatznummer	e7870
Fachzeitschrift	Concurrency and Computation: Practice and Experience
Jahrgang	35
Ausgabenummer	25
Publikationsstatus	Veröffentlicht - 15 Nov. 2023
Peer-Review-Status	Ja

Externe IDs

ORCID	/0000-0003-4414-4340/work/159608268

Schlagworte

ASJC Scopus Sachgebiete

Software
Theoretische Informatik
Angewandte Informatik
Computernetzwerke und -kommunikation
Theoretische Informatik und Mathematik

Schlagwörter

C++ tuples, generic algorithms, GPU, memory layout, multi-core, performance portability

Bibliotheksschlagworte

004 Informatik

Forschungsportal der TU Dresden