Low Cost Synchronization for Actively Replicated Data Streams
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Active replication is an attractive fault tolerance approach for data stream applications as it provides an almost instantaneous recovery matching well the low latency requirements for real time data analytics. Although the approach offers a quick recovery, it is rarely used in industry as it requires complex mechanisms such as atomic broadcast to ensure correctness, and introduces a non-negligible overhead. The majority of data stream applications compute over event windows utilizing operators such as aggregations or joins which share the property of commutativity where the correctness of the result does not rely on the order of events within such windows. In this paper, we exploit this ordering flexibility by proposing (i) an epoch-based deterministic merge algorithm which provides correctness at a much lower cost than a full-fledged atomic broadcast protocol or deterministic execution to achieve strict ordering. We furthermore propose (ii) a leader-follower protocol as an extension to this approach that lowers the impact on latency caused by stragglers and stops the propagation of any non-determinism originating from source operators. Our evaluation shows that the throughput can be improved by an order of magnitude compared to a strict ordering while providing the same guarantees.
Details
Originalsprache | Undefiniert |
---|---|
Titel | 2019 9th Latin-American Symposium on Dependable Computing (LADC) |
Seiten | 1-10 |
Seitenumfang | 10 |
Publikationsstatus | Veröffentlicht - Nov. 2019 |
Peer-Review-Status | Ja |
Externe IDs
Scopus | 85081594156 |
---|
Schlagworte
Forschungsprofillinien der TU Dresden
DFG-Fachsystematik nach Fachkollegium
Schlagwörter
- Controlled Indexing, broadcast communication, computer network reliability, data analysis, fault tolerant computing, merging, protocols, synchronisation, attractive fault tolerance approach, data stream applications, nondeterminism propagation, deterministic execution, full-fledged atomic broadcast protocol, epoch-based deterministic merge algorithm, atomic broadcast, real time data analytics, instantaneous recovery matching, active replication, actively replicated data streams, low cost synchronization, low latency requirements