Critical-blame analysis for OpenMP 4.0 offloading on Intel Xeon Phi
Research output: Other contribution › Other › Contributed › peer-review
Abstract
Recent supercomputers rated in the TOP 500 list increasingly utilize accelerator or co-processor devices to improve performance and energy efficiency. Since version 4.0 of the specification OpenMP addresses this heterogeneity in computing with the target directives, which enable programmers to offload portions of the code to massively-parallel target devices. Due to this new complexity in hardware and software design, performance optimization of large-scale parallel programs becomes more and more challenging. As manual performance analysis is getting infeasible for complex high performance computing (HPC) codes, we propose an approach to automatically detect bottlenecks such as load imbalances in heterogeneous OpenMP applications. We developed a method to perform critical-path and root-cause analysis for the OpenMP 4.0 offloading model and integrated it into the tool CASITA. The post-mortem analysis is based on execution traces that are generated with an implementation of the evolving OpenMP Tools Interface into the measurement system Score-P. To validate the implementation of our method we ported several existing codes to OpenMP 4.0 , executed them with an Intel Xeon Phi as the target device and analyzed the resulting trace files.
Details
Original language | English |
---|---|
Number of pages | 8 |
Volume | 125 |
Publication status | Published - 2016 |
Peer-reviewed | Yes |
No renderer: customAssociatesEventsRenderPortal,dk.atira.pure.api.shared.model.researchoutput.OtherContribution
External IDs
Scopus | 84955312939 |
---|
Keywords
Sustainable Development Goals
Keywords
- hardware, energy efficiency, software