User-Constraint and Self-Adaptive Fault Tolerance for Event Stream Processing Systems

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Event Stream Processing (ESP) Systems are currently enabling a renaissance in the data processing area as they provide results at low latency compared to the traditional MapReduce approach. Although the majority of ESP systems offer some form of fault tolerance to their users, the provided fault tolerance scheme is often not tailored to the application at hand. For example, active replication is well suited for critical applications where unresponsiveness due to a background recovery process is not acceptable. However, for other classes of applications without such tight constraints, the use of passive replication, based on checkpoints and logging, is a better choice as it can save a significant amount of resources compared to active replication. In this paper, we present StreamMine3G, a fault tolerant and elastic ESP system which employs several fault tolerance schemes, such as passive and active replication as well as intermediate alternatives such as active and passive standby. In order to free the user from the burden of choosing the correct scheme for the application at hand, StreamMine3G is equipped with a fault-tolerance controller that transitions between the employed schemes during runtime in response to the evolution of the given workload and the user's provided constraints (recovery time and semantics, i.e., gap or precise). Our evaluation shows that the overall resource footprint for fault tolerance can be considerably reduced using our adaptive approach without consequences to the recovery time.

Details

OriginalspracheEnglisch
TitelProceedings of The 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2015)
ErscheinungsortLos Alamitos, CA, USA
Herausgeber (Verlag)IEEE Computer Society, Washington
Seiten462-473
Seitenumfang12
PublikationsstatusVeröffentlicht - 1 Juni 2015
Peer-Review-StatusJa

Externe IDs

Scopus 84950103669

Schlagworte

Forschungsprofillinien der TU Dresden

DFG-Fachsystematik nach Fachkollegium

Schlagwörter

  • checkpointing, data handling, fault tolerant computing, parallel processing, ESP systems, MapReduce approach, StreamMine3G, background recovery process, data processing area, elastic ESP system, event stream processing systems, fault tolerant system, fault-tolerance controller, recovery time, resource footprint, self-adaptive fault tolerance scheme, user-constraint fault tolerance scheme, Computer crashes, Data processing, Fault tolerance, Fault tolerant systems, Peer-to-peer computing, Storms, Synchronization, active replication, active standby, adaptation, deterministic execution, fault tolerance, gap recovery, passive replication, passive standby, precise recovery, fault tolerance, active replication, passive replication, active standby, passive standby, adaptation, deterministic execution, precise recovery, gap recovery, Peer-to-peer computing, Computer crashes, Data processing, Synchronization, Storms