Transactional Encoding for Tolerating Transient Hardware Errors

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in Buch/Sammelband/GutachtenBeigetragenBegutachtung

Beitragende

Abstract

The decreasing feature size of integrated circuits leads to less reliable hardware with higher likelihood for errors. Without adding additional failure detection and masking mechanisms, the next generations of CPUs would at least be unfit for executing mission- and safety-critical applications. One common approach is the replicated execution of programs on redundant cores, which is increasingly difficult considering that most programs are non-deterministic. To be able to detect and mask execution errors, one typically need to execute three copies of each thread.

In this paper, we propose and evaluate transactional encoding, a novel approach to detect and mask transient hardware errors such that one can build safe applications on top of unreliable components. Transactional encoding relies on a combination of arithmetic codes for detecting transient hardware errors and transactional memory for recovery and tolerance of transient errors. We present a prototype software implementation that encodes applications using an LLVM-based compiler and executes them with a customized software transactional memory algorithm. Our evaluation shows that our system can successfully survive between 90-96% of transient hardware errors.

Details

OriginalspracheEnglisch
TitelStabilization, Safety, and Security of Distributed Systems: 15th International Symposium (SSS 2015)
Seiten1-16
Seitenumfang16
Band8255
PublikationsstatusVeröffentlicht - 2013
Peer-Review-StatusJa

Externe IDs

Scopus 84893972869

Schlagworte

Forschungsprofillinien der TU Dresden

DFG-Fachsystematik nach Fachkollegium

Schlagwörter

  • Error Detectin, Code Word, Error Recovery, Trnsactional memory, arithmetic code