Automatically Tolerating Arbitrary Faults in Non-malicious Settings

Research output: Contribution to conferencesPaperContributedpeer-review

Contributors

Abstract

Arbitrary faults such as bit flips have been often observed in commodity-hardware data centers and have disrupted large services. Benign faults, such as crashes and message omissions, are nevertheless the standard assumption in practical fault-tolerant distributed systems. Algorithms tolerant to arbitrary faults are harder to understand and more expensive to deploy (requiring more machines). In this work, we introduce a non-malicious arbitrary fault model including transient and permanent arbitrary faults, such as bit flips and hardware-design errors, but no malicious faults, typically caused by security breaches. We then present a compiler-based framework that allows benign fault-tolerant algorithms to automatically tolerate arbitrary faults in non-malicious settings. Finally, we experimentally evaluate two fundamental algorithms: Paxos and leader election. At expense of CPU cycles, transformed algorithms use the same number of processes as their benign fault-tolerant counterparts, and have virtually no network overhead, while reducing the probability of failing arbitrarily by two orders of magnitude.

Details

Original languageEnglish
Pages114-123
Number of pages10
Publication statusPublished - 2013
Peer-reviewedYes

Conference

TitleSixth Latin-American Symposium on Dependable Computing (LADC), IEEE Computer Society, 2013
Abbreviated titleLADC 2013
Conference number
Duration1 May 2013
Degree of recognitionInternational event
Location
CityRio de Janeiro
CountryBrazil

External IDs

Scopus 84881137463

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Keywords

  • Byzantine faults, fault tolerance, hardware errors, algorithm transformation, arbitrary faults, Computer crashes, Encoding, Fault Tolerance, Fault tolerant systems, Hardware, distributed algorithms, Transforoms, Byzantine faults, hardware errors