Enforcing Perfect Failure Detection

Research output: Contribution to conferencesPaperContributedpeer-review

Abstract

Perfect failure detectors can correctly decide whether a computer is crashed. However it is impossible to implement a perfect failure detector in purely asynchronous systems. We show how to enforce perfect failure detection in timed distributed systems with hardware watchdogs. The two main system model assumptions are: each computer can measure time intervals with a known maximum error; and each computer has a watchdog that crashes the computer unless the watchdog is periodically updated. We have implemented a system that satisfies both assumptions using a combination of off-the-shelf software and hardware.

Details

Original languageEnglish
Pages350-359
Number of pages10
Publication statusPublished - 2001
Peer-reviewedYes

Conference

Title2001 21st International Conference on Distributed Computing Systems
Abbreviated titleICDSC 2001
Conference number21
Duration16 - 19 April 2001
Degree of recognitionInternational event
CityMesa
CountryUnited States of America

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Keywords

  • perfect failure detection, crash failures, asynchronous distribued systems, timed asynchronous system model, Computer crashes, detectors, time measurement, Computer errors, Fault tolerant systems, Clocks, Error correction, Fault detection, Heart beat, Fault tolerant computing, distributed processing, system recovery, purely asynchronous systems, timed distribued systems, hardware watchdogs, time intervals, off-the-shelf software