Using the Heartbeat Failure Detector for Quiescent Reliable Communication and Consensus in Partitionable Networks

Theoretical Computer Science, Elsevier Science, invited paper in the special issue on distributed algorithms, 220:1, June 1999, pp. 3-30. |

We consider partitionable networks with process crashes and lossy links, and focus on the problems of reliable communication and consensus for such networks. For both problems we seek algorithms that are quiescent, i.e., algorithms that eventually stop sending messages. We first tackle the problem of reliable communication for partitionable networks by extending the results of Aguilera et al. (1997). In particular, we generalize the specification of the heartbeat failure detector .Z’&?, show how to implement it, and show how to use it to achieve quiescent reliable communication. We then turn our attention to the problem of consensus for partitionable networks. We first show that, even though this problem can be solved using a natural extension of failure detector OY, such solutions are not quiescent – in other words, 0 5“ alone is not sufficient to achieve quiescent consensus in partitionable networks. We then solve this problem using 05‘ and the quiescent reliable communication primitives that we developed in the first part of the paper.