Trading fault tolerance for performance in AN encoding
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9× to 2.1×. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having finegrained control over levels of fault tolerance can be beneficial.
Details
Original language | English |
---|---|
Title of host publication | ACM International Conference on Computing Frontiers 2017, CF 2017 |
Publisher | Association for Computing Machinery, Inc |
Pages | 183-190 |
Number of pages | 8 |
ISBN (electronic) | 9781450344876 |
Publication status | Published - 15 May 2017 |
Peer-reviewed | Yes |
Conference
Title | 14th ACM International Conference on Computing Frontiers, CF 2017 |
---|---|
Duration | 15 - 17 May 2017 |
City | Siena |
Country | Italy |
External IDs
ORCID | /0000-0002-5007-445X/work/141545564 |
---|
Keywords
Research priority areas of TU Dresden
ASJC Scopus subject areas
Keywords
- Code generation, Error detection, Fault injection, LLVM, Resilience, Soft errors, Transient hardware faults