Trading fault tolerance for performance in AN encoding
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9× to 2.1×. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having finegrained control over levels of fault tolerance can be beneficial.
Details
Originalsprache | Englisch |
---|---|
Titel | ACM International Conference on Computing Frontiers 2017, CF 2017 |
Herausgeber (Verlag) | Association for Computing Machinery, Inc |
Seiten | 183-190 |
Seitenumfang | 8 |
ISBN (elektronisch) | 9781450344876 |
Publikationsstatus | Veröffentlicht - 15 Mai 2017 |
Peer-Review-Status | Ja |
Konferenz
Titel | 14th ACM International Conference on Computing Frontiers, CF 2017 |
---|---|
Dauer | 15 - 17 Mai 2017 |
Stadt | Siena |
Land | Italien |
Externe IDs
ORCID | /0000-0002-5007-445X/work/141545564 |
---|
Schlagworte
Forschungsprofillinien der TU Dresden
ASJC Scopus Sachgebiete
Schlagwörter
- Code generation, Error detection, Fault injection, LLVM, Resilience, Soft errors, Transient hardware faults