Quantifying Performance and Scalability of the Distributed Monitoring Infrastructure SLAte

Research output: Contribution to journalConference articleContributedpeer-review

Abstract

Job-centric monitoring allows to observe the execution of programs and services (so called jobs) on remote and local computing resources. Especially large installations like Grids, Clouds and HPC systems with many thousands of jobs can have large benefits from intelligent visualisations of recorded monitoring data and semi-automatic analyses. The latter can reveal misbehaving jobs or non-optimal job execution and enables future optimisations to establish a more efficient use of the allocated resources. The challenge of job-centric monitoring infrastructures is to store, search and access data collected on huge installations. We take this challenge with a distributed layer-based architecture which provides a uniform view to all monitoring data. The concept of this infrastructure called SLAte, a performance evaluation, and the consequences for scalability are presented in this paper.

Details

Original languageEnglish
JournalPARS-Mitteilungen
Volume32
Issue number1
Publication statusPublished - 2015
Peer-reviewedYes

Keywords

Research priority areas of TU Dresden