Program analysis tools for massively parallel applications: How to achieve highest performance

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review



Today's HPC environments are increasingly complex in order to achieve highest performance. Hardware platforms introduce features like out-of-order execution, multi-level caches, multi-cores, non-uniform memory access etc. Application software combines OpenMP, MPI, optimized libraries and various types of compiler optimization to exploit potential performance.To reach a reasonable percentage of the theoretical peak performance, three fundamental steps need to be accomplished. First, correctness must be guaranteed especially during the course of optimization. Second, the actual performance achieved needs to be determined. In particular the contributions/limitations of all sub-systems involved (CPU, memory, network, I/O) have to be identified. Third, actual optimization can only be successful with the previously obtained knowledge.Those steps are by no means trivial. There are sophisticated tools beyond simple profiling to support the HPC user. The tutorial introduces a variety of such tools: it shows how they play together and how they scale with long-running massively parallel cases.


Original languageEnglish
Title of host publicationSC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
PublisherAssociation for Computing Machinery (ACM), New York
ISBN (print)978-0-7695-2700-0
Publication statusPublished - 2006

Publication series

SeriesSC: The International Conference for High Performance Computing, Networking, Storage, and Analysis


Research priority areas of TU Dresden

ASJC Scopus subject areas