How to Build a Highly Available System Using Consensus
Published by Springer
Editor(s): Ozalp Babaoglu and Keith Marzullo
The proceedings are: Distributed Algorithms, Lecture Notes in Computer Science 1151, Springer, 1996.
Lamport showed that a replicated deterministic state machine is a general way to implement a highly available system, given a consensus algo-rithm that the replicas can use to agree on each input. His Paxos algorithm is the most fault-tolerant way to get consensus without real-time guarantees. Because general consensus is expensive, practical systems reserve it for emergencies and use leases (locks that time out) for most of the computing. This paper explains the general scheme for efficient highly available computing, gives a general method for understanding concurrent and fault-tolerant programs, and derives the Paxos algorithm as an example of the method.