Journal of the ACM Bibliography

Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225-267, March 1996. [BibTeX entry]
Abstract

We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties -- completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].

The abstract is also available as a LaTeX file, a DVI file, or a PostScript file.

Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems -- distributed applications, distributed databases, network operating systems; C.4 [Performance of Systems]; D.1.3 [Programming Techniques]: Concurrent Programming -- distributed programming; D.4.5 [Operating Systems]: Reliability -- fault-tolerance; F.1.1 [Computation by Abstract Devices]: Models of Computation -- automata, relations among models; F.1.2 [Computation by Abstract Devices]: Modes of Computation -- parallelism and concurrency; H.2.4 [Database Management]: Systems -- concurrency, distributed systems, transaction processing

General Terms: Algorithms, Reliability, Theory

Additional Key Words and Phrases: agreement problem, asynchronous systems, atomic broadcast, \emph{Byzantine Generals'} problem, commit problem, consensus problem, crash failures, failure detection, fault-tolerance, message passing, partial synchrony, processor failures

Selected papers that cite this one

Selected references


Shortcuts:

  • Journal of the ACM homepage
  • Bibliography top level
  • Journal of the ACM Author Index
  • Search the HBP database