Quantifying Performance and Scalability of the Distributed Monitoring Infrastructure SLAte

Marcus Hilbrich; Ralph Müller-Pfefferkorn

Quantifying Performance and Scalability of the Distributed Monitoring Infrastructure SLAte

Publikation: Beitrag in Fachzeitschrift › Konferenzartikel › Beigetragen › Begutachtung

Beitragende

Marcus Hilbrich - , Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) (Autor:in)
Ralph Müller-Pfefferkorn - , Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) (Autor:in)

Abstract

Job-centric monitoring allows to observe the execution of programs and services (so called jobs) on remote and local computing resources. Especially large installations like Grids, Clouds and HPC systems with many thousands of jobs can have large benefits from intelligent visualisations of recorded monitoring data and semi-automatic analyses. The latter can reveal misbehaving jobs or non-optimal job execution and enables future optimisations to establish a more efficient use of the allocated resources. The challenge of job-centric monitoring infrastructures is to store, search and access data collected on huge installations. We take this challenge with a distributed layer-based architecture which provides a uniform view to all monitoring data. The concept of this infrastructure called SLAte, a performance evaluation, and the consequences for scalability are presented in this paper.

Details

Originalsprache	Englisch
Fachzeitschrift	PARS-Mitteilungen
Jahrgang	32
Ausgabenummer	1
Publikationsstatus	Veröffentlicht - 2015
Peer-Review-Status	Ja

Externe IDs

ORCID	/0000-0001-8719-5741/work/173053632

Schlagworte

Forschungsprofillinien der TU Dresden

Informationstechnologien und Mikroelektronik