Performance analysis of multi-level parallelism: inter-node, intra-node and hardware accelerators

Research output: Other contributionOtherContributedpeer-review

Abstract

The advent of multi-core processors has made parallel computing techniques mandatory on mainstream systems. With the recent rise in hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. The inner workings of a parallel program are usually difficult to understand and verify. This paper presents a tool for graphical program flow analysis of hardware accelerated parallel programs. It monitors the hybrid program execution to record and visualize many performance relevant events along the way. Representative real-world applications written for both IBM's Cell processor and NVIDIA's CUDA API are studied exemplarily. With our combined monitoring and visualization approach for hardware accelerated multi-core and multi-node systems we take the next step in tool evolution towards a highly improved level of detail, precision, and completeness. The contents of this paper is of interest to developers of hardware accelerated applications as well as performance tool architects. Copyright

Details

Original languageEnglish
Number of pages11
VolumeVol. 24
Publication statusPublished - 2012
Peer-reviewedYes
No renderer: customAssociatesEventsRenderPortal,dk.atira.pure.api.shared.model.researchoutput.OtherContribution

External IDs

Scopus 84855220688
ORCID /0000-0002-8491-770X/work/141543265

Keywords

Keywords

  • performance analysis, hardware