PIKA: Center-Wide and Job-Aware Cluster Monitoring
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Nowadays, performance optimization is more or less an established procedure in high-performance computing (HPC) centers. To sustainably increase compute efficiency of such systems, we need to increase the awareness of efficiency on both the operator's and the users' side. Therefore, we propose an infrastructure for continuous monitoring and analysis, which automatically characterizes HPC jobs and provides a systematic approach to identify underperforming compute jobs with optimization potential. The recorded metadata and time-series data can be visualized live at runtime or post-mortem and are eventually stored for long-term analysis. The monitoring has a negligible overhead on the compute nodes and neither influences nor limits the user applications.
Details
Originalsprache | Englisch |
---|---|
Titel | 2020 IEEE International Conference on Cluster Computing (CLUSTER) |
Herausgeber (Verlag) | IEEE Computer Society, Washington |
Seiten | 424-432 |
Seitenumfang | 9 |
ISBN (elektronisch) | 978-1-7281-6677-3 |
ISBN (Print) | 978-1-7281-6678-0 |
Publikationsstatus | Veröffentlicht - 14 Sept. 2020 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | IEEE International Conference on Cluster Computing |
---|---|
ISSN | 1552-5244 |
Konferenz
Titel | 2020 IEEE International Conference on Cluster Computing |
---|---|
Kurztitel | CLUSTER 2020 |
Dauer | 14 - 17 September 2020 |
Webseite | |
Bekanntheitsgrad | Internationale Veranstaltung |
Ort | online |
Stadt | Kobe |
Land | Japan |
Externe IDs
Scopus | 85096230773 |
---|---|
WOS | 000698696500051 |
Schlagworte
Schlagwörter
- monitoring, data collection, data visualization, data analysis, collectd, LIKWID