HardPaxos: Replication Hardened Against Hardware Errors
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
State Machine Replication (SMR) is a common
technique to make services fault-tolerant. Practical SMR sys-
tems tolerate process crashes, but no hardware errors such
as bit flips. Still, hardware errors can cause major service
outages, and their rate is expected to increase in the future.
Current approaches either incur a high overhead by hardening
large parts of the system in software, or increase the cost of
ownership by introducing additional hardware components.
This work presents HardPaxos, an atomic broadcast al-
gorithm for SMR that enables services to tolerate hardware
errors, while incurring little performance and state overhead.
HardPaxos requires no additional hardware and has only a
small part of its functionality hardened using a combination of
AN-encoding and duplicated execution. Our evaluation shows
a throughput overhead of at most 5% for typical payload sizes.
Moreover, fault injection experiments show that our hardening
decreases the number of undetected errors from 15% to 0.02%
technique to make services fault-tolerant. Practical SMR sys-
tems tolerate process crashes, but no hardware errors such
as bit flips. Still, hardware errors can cause major service
outages, and their rate is expected to increase in the future.
Current approaches either incur a high overhead by hardening
large parts of the system in software, or increase the cost of
ownership by introducing additional hardware components.
This work presents HardPaxos, an atomic broadcast al-
gorithm for SMR that enables services to tolerate hardware
errors, while incurring little performance and state overhead.
HardPaxos requires no additional hardware and has only a
small part of its functionality hardened using a combination of
AN-encoding and duplicated execution. Our evaluation shows
a throughput overhead of at most 5% for typical payload sizes.
Moreover, fault injection experiments show that our hardening
decreases the number of undetected errors from 15% to 0.02%
Details
Originalsprache | Englisch |
---|---|
Titel | Proceedings of the 33rd IEEE Symposium on Reliable Distributed Systems (SRDS'14) |
Seitenumfang | 10 |
Publikationsstatus | Veröffentlicht - 1 Okt. 2014 |
Peer-Review-Status | Ja |
Externe IDs
Scopus | 84938922909 |
---|
Schlagworte
Forschungsprofillinien der TU Dresden
DFG-Fachsystematik nach Fachkollegium
Schlagwörter
- hardware errors, Byzantine faults, paxos