Critical-blame analysis for OpenMP 4.0 offloading on Intel Xeon Phi

Research output: Other contributionOtherContributedpeer-review

Contributors

  • R. Dietrich - (Author)
  • F. Schmitt - (Author)
  • A. Grund - (Author)
  • J. Stolle - (Author)

Abstract

Recent supercomputers rated in the TOP 500 list increasingly utilize accelerator or co-processor devices to improve performance and energy efficiency. Since version 4.0 of the specification OpenMP addresses this heterogeneity in computing with the target directives, which enable programmers to offload portions of the code to massively-parallel target devices. Due to this new complexity in hardware and software design, performance optimization of large-scale parallel programs becomes more and more challenging. As manual performance analysis is getting infeasible for complex high performance computing (HPC) codes, we propose an approach to automatically detect bottlenecks such as load imbalances in heterogeneous OpenMP applications. We developed a method to perform critical-path and root-cause analysis for the OpenMP 4.0 offloading model and integrated it into the tool CASITA. The post-mortem analysis is based on execution traces that are generated with an implementation of the evolving OpenMP Tools Interface into the measurement system Score-P. To validate the implementation of our method we ported several existing codes to OpenMP 4.0 , executed them with an Intel Xeon Phi as the target device and analyzed the resulting trace files.

Details

Original languageEnglish
Number of pages8
Volume125
Publication statusPublished - 2016
Peer-reviewedYes
No renderer: customAssociatesEventsRenderPortal,dk.atira.pure.api.shared.model.researchoutput.OtherContribution

External IDs

Scopus 84955312939

Keywords

Sustainable Development Goals

Keywords

  • hardware, energy efficiency, software