FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
In this paper we describe the hardware and application-inherent challenges that future exascale systems pose to high-performance computing (HPC) and propose a system architecture that addresses them. This architecture is based on proven building blocks and few principles: (1) a fast light-weight kernel that is supported by a virtualized Linux for tasks that are not performance critical, (2) decentralized load and health management using fault-tolerant gossip-based information dissemination, (3) a maximally-parallel checkpoint store for cheap checkpoint/restart in the presence of frequent component failures, and (4) a runtime that enables applications to interact with the underlying system platform through new interfaces. The paper discusses the vision behind FFMK and the current state of a prototype implementation of the system, which is based on a microkernel and an adapted MPI runtime.
Details
Originalsprache | Englisch |
---|---|
Titel | Software for Exascale Computing - SPPEXA 2013-2015 |
Herausgeber (Verlag) | Springer, Cham |
Seiten | 405-426 |
Seitenumfang | 22 |
ISBN (Print) | 978-3-319-40526-1 |
Publikationsstatus | Veröffentlicht - 2016 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | Lecture notes in computational science and engineering : LNCSE |
---|---|
Band | 136 |
Externe IDs
Scopus | 84989926715 |
---|---|
ORCID | /0000-0003-3137-0648/work/142238849 |
Schlagworte
Schlagwörter
- MPI library, applikation, performance