Today's HPC environments are increasingly complex in order to achieve highest performance. Hardware platforms introduce features like out-of-order execution, multi-level caches, multi-cores, non-uniform memory access etc. Application software combines OpenMP, MPI, optimized libraries and various types of compiler optimization to exploit potential performance.To reach a reasonable percentage of the theoretical peak performance, three fundamental steps need to be accomplished. First, correctness must be guaranteed especially during the course of optimization. Second, the actual performance achieved needs to be determined. In particular the contributions/limitations of all sub-systems involved (CPU, memory, network, I/O) have to be identified. Third, actual optimization can only be successful with the previously obtained knowledge.Those steps are by no means trivial. There are sophisticated tools beyond simple profiling to support the HPC user. The tutorial introduces a variety of such tools: it shows how they play together and how they scale with long-running massively parallel cases.
|Title of host publication||SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing|
|Publisher||Association for Computing Machinery (ACM), New York|
|Publication status||Published - 2006|
|Series||SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis|