Low Cost Synchronization for Actively Replicated Data Streams
Research output: Contribution to book/conference proceedings/anthology/report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Active replication is an attractive fault tolerance approach for data stream applications as it provides an almost instantaneous recovery matching well the low latency requirements for real time data analytics. Although the approach offers a quick recovery, it is rarely used in industry as it requires complex mechanisms such as atomic broadcast to ensure correctness, and introduces a non-negligible overhead. The majority of data stream applications compute over event windows utilizing operators such as aggregations or joins which share the property of commutativity where the correctness of the result does not rely on the order of events within such windows. In this paper, we exploit this ordering flexibility by proposing (i) an epoch-based deterministic merge algorithm which provides correctness at a much lower cost than a full-fledged atomic broadcast protocol or deterministic execution to achieve strict ordering. We furthermore propose (ii) a leader-follower protocol as an extension to this approach that lowers the impact on latency caused by stragglers and stops the propagation of any non-determinism originating from source operators. Our evaluation shows that the throughput can be improved by an order of magnitude compared to a strict ordering while providing the same guarantees.
Details
Original language | Undefined |
---|---|
Title of host publication | 2019 9th Latin-American Symposium on Dependable Computing (LADC) |
Pages | 1-10 |
Number of pages | 10 |
Publication status | Published - Nov 2019 |
Peer-reviewed | Yes |
External IDs
Scopus | 85081594156 |
---|
Keywords
Research priority areas of TU Dresden
DFG Classification of Subject Areas according to Review Boards
Keywords
- Controlled Indexing, broadcast communication, computer network reliability, data analysis, fault tolerant computing, merging, protocols, synchronisation, attractive fault tolerance approach, data stream applications, nondeterminism propagation, deterministic execution, full-fledged atomic broadcast protocol, epoch-based deterministic merge algorithm, atomic broadcast, real time data analytics, instantaneous recovery matching, active replication, actively replicated data streams, low cost synchronization, low latency requirements