Failure Detection and Consensus in the Crash-recovery Model

Marcos K. Aguilera; Wei Chen; Sam Toueg

Failure Detection and Consensus in the Crash-recovery Model

Marcos K. Aguilera ,
Wei Chen ,
Sam Toueg

In Proceedings of the 12th International Symposium on Distributed Computing (DISC'98), Andros, Greece, Lecture Notes on Computer Science 1499, Springer-Verlag, September 1998, pp. 231-245. | September 1998

Download BibTex

We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We ﬁrst propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efﬁcient in the runs that are most likely in practice — those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3δ time and with 4n messages, where δ is the maximum message delay and n is the number of processes in the system.