In recent years, High Performance Computing (HPC) has become increasingly important for many industries and research areas besides ‘classic’ applications. As new domains emerge, applications, implementations and frameworks become more diverse. Generic performance analysis tools often cannot keep up with the development speed of new approaches for workload distribution, offloading, and communication. Some of the new approaches employ their own performance monitoring, which is difficult to integrate into generic tools designed for traditional HPC. Performance measurements often result in a collection of separate performance logs that logically form a unit but cannot intuitively be investigated together with established performance tools. In this paper, we present a tool library that can be used to combine separate performance logs and separately recorded metrics into one single performance log, enabling investigation of such performance data as a unit. Use cases from Big Data processing and AI show the broad applicability of our approach.
|Title of host publication
|Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
|Association for Computing Machinery (ACM), New York
|Number of pages
|Published - 12 Nov 2023
|SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis