Automatically Tolerating Arbitrary Faults in Non-malicious Settings
Research output: Contribution to conferences › Paper › Contributed › peer-review
Contributors
Abstract
Arbitrary faults such as bit flips have been often observed in commodity-hardware data centers and have disrupted large services. Benign faults, such as crashes and message omissions, are nevertheless the standard assumption in practical fault-tolerant distributed systems. Algorithms tolerant to arbitrary faults are harder to understand and more expensive to deploy (requiring more machines). In this work, we introduce a non-malicious arbitrary fault model including transient and permanent arbitrary faults, such as bit flips and hardware-design errors, but no malicious faults, typically caused by security breaches. We then present a compiler-based framework that allows benign fault-tolerant algorithms to automatically tolerate arbitrary faults in non-malicious settings. Finally, we experimentally evaluate two fundamental algorithms: Paxos and leader election. At expense of CPU cycles, transformed algorithms use the same number of processes as their benign fault-tolerant counterparts, and have virtually no network overhead, while reducing the probability of failing arbitrarily by two orders of magnitude.
Details
Original language | English |
---|---|
Pages | 114-123 |
Number of pages | 10 |
Publication status | Published - 2013 |
Peer-reviewed | Yes |
Conference
Title | Sixth Latin-American Symposium on Dependable Computing (LADC), IEEE Computer Society, 2013 |
---|---|
Abbreviated title | LADC 2013 |
Conference number | |
Duration | 1 May 2013 |
Degree of recognition | International event |
Location | |
City | Rio de Janeiro |
Country | Brazil |
External IDs
Scopus | 84881137463 |
---|
Keywords
Research priority areas of TU Dresden
DFG Classification of Subject Areas according to Review Boards
Keywords
- Byzantine faults, fault tolerance, hardware errors, algorithm transformation, arbitrary faults, Computer crashes, Encoding, Fault Tolerance, Fault tolerant systems, Hardware, distributed algorithms, Transforoms, Byzantine faults, hardware errors