Nowadays, HPC systems often comprise heterogeneous architectures with general purpose processors and additional accelerator devices. For performance and energy efficiency reasons, parallel codes need to optimally exploit available hardware resources. To utilize different compute resources, there exists a wide range of application programming interfaces (APIs), some of which are vendor-specific, such as CUDA for NVIDIA graphics processors. Consequently, implementing portable applications for heterogeneous architectures requires substantial efforts and possibly several code bases, which often cannot be properly maintained due to limited developer resources. Abstraction layers such as Kokkos promise platform independence of application code and thereby mitigate repeated porting efforts for each new accelerator platform. The abstraction layer handles the mapping of abstract code statements onto specific APIs. Unfortunately, this abstraction does not automatically guarantee efficient execution on every platform and therefore requires performance tuning. For this purpose, Kokkos provides a profiling interface allowing performance tools to acquire detailed Kokkos activity information, closing the gap between program code and back-end API. In this paper, we introduce support for the Kokkos profiling interface in the Score-P measurement infrastructure, which enables performance analysis of Kokkos applications with a wide range of tools.
|Title of host publication||Tools for High Performance Computing 2018/2019|
|Number of pages||14|
|Publication status||Published - 2021|