Low Cost Synchronization for Actively Replicated Data Streams

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Active replication is an attractive fault tolerance approach for data stream applications as it provides an almost instantaneous recovery matching well the low latency requirements for real time data analytics. Although the approach offers a quick recovery, it is rarely used in industry as it requires complex mechanisms such as atomic broadcast to ensure correctness, and introduces a non-negligible overhead. The majority of data stream applications compute over event windows utilizing operators such as aggregations or joins which share the property of commutativity where the correctness of the result does not rely on the order of events within such windows. In this paper, we exploit this ordering flexibility by proposing (i) an epoch-based deterministic merge algorithm which provides correctness at a much lower cost than a full-fledged atomic broadcast protocol or deterministic execution to achieve strict ordering. We furthermore propose (ii) a leader-follower protocol as an extension to this approach that lowers the impact on latency caused by stragglers and stops the propagation of any non-determinism originating from source operators. Our evaluation shows that the throughput can be improved by an order of magnitude compared to a strict ordering while providing the same guarantees.

Details

OriginalspracheUndefiniert
Titel2019 9th Latin-American Symposium on Dependable Computing (LADC)
Seiten1-10
Seitenumfang10
PublikationsstatusVeröffentlicht - Nov. 2019
Peer-Review-StatusJa

Externe IDs

Scopus 85081594156

Schlagworte

Forschungsprofillinien der TU Dresden

DFG-Fachsystematik nach Fachkollegium

Schlagwörter

  • Controlled Indexing, broadcast communication, computer network reliability, data analysis, fault tolerant computing, merging, protocols, synchronisation, attractive fault tolerance approach, data stream applications, nondeterminism propagation, deterministic execution, full-fledged atomic broadcast protocol, epoch-based deterministic merge algorithm, atomic broadcast, real time data analytics, instantaneous recovery matching, active replication, actively replicated data streams, low cost synchronization, low latency requirements