Automatically Tolerating Arbitrary Faults in Non-malicious Settings

Diogo Behrens; Stefan Weigert; Christof Fetzer

doi:doi:10.1109/LADC.2013.26

Automatically Tolerating Arbitrary Faults in Non-malicious Settings

Publikation: Beitrag zu Konferenzen › Paper › Beigetragen › Begutachtung

Beitragende

Diogo Behrens - , Professur für Systems Engineering (SE) (Autor:in)
Stefan Weigert - , Professur für Systems Engineering (SE) (Autor:in)
Christof Fetzer - , Professur für Systems Engineering (SE) (Autor:in)

Abstract

Arbitrary faults such as bit flips have been often observed in commodity-hardware data centers and have disrupted large services. Benign faults, such as crashes and message omissions, are nevertheless the standard assumption in practical fault-tolerant distributed systems. Algorithms tolerant to arbitrary faults are harder to understand and more expensive to deploy (requiring more machines). In this work, we introduce a non-malicious arbitrary fault model including transient and permanent arbitrary faults, such as bit flips and hardware-design errors, but no malicious faults, typically caused by security breaches. We then present a compiler-based framework that allows benign fault-tolerant algorithms to automatically tolerate arbitrary faults in non-malicious settings. Finally, we experimentally evaluate two fundamental algorithms: Paxos and leader election. At expense of CPU cycles, transformed algorithms use the same number of processes as their benign fault-tolerant counterparts, and have virtually no network overhead, while reducing the probability of failing arbitrarily by two orders of magnitude.

Details

Originalsprache	Englisch
Seiten	114-123
Seitenumfang	10
Publikationsstatus	Veröffentlicht - 2013
Peer-Review-Status	Ja

Konferenz

Titel	Sixth Latin-American Symposium on Dependable Computing (LADC), IEEE Computer Society, 2013
Kurztitel	LADC 2013
Veranstaltungsnummer
Dauer	1 Mai 2013
Bekanntheitsgrad	Internationale Veranstaltung
Ort
Stadt	Rio de Janeiro
Land	Brasilien

Externe IDs

Scopus	84881137463

Schlagworte

Forschungsprofillinien der TU Dresden

Informationstechnologien und Mikroelektronik

DFG-Fachsystematik nach Fachkollegium

Sicherheit und Verlässlichkeit

Schlagwörter

Byzantine faults, fault tolerance, hardware errors, algorithm transformation, arbitrary faults, Computer crashes, Encoding, Fault Tolerance, Fault tolerant systems, Hardware, distributed algorithms, Transforoms, Byzantine faults, hardware errors

Forschungsportal der TU Dresden