Abstract

In [143], I announced some lower-bound results for the consensus problem. One result states that two message delays are required to choose a value, and a relatively large number of processors are needed to achieve that bound. When writing a careful proof of this result, I realized that it required the hypothesis that values proposed by two different processors could be chosen in two message delays. This led me to realize that fewer processors were needed if there were only one processor whose proposed value could be chosen in two message delays, and values proposed by other processors took longer to be chosen. In fact, a simple modification to the Paxos algorithm of [122] accomplished this.

I then looked for applications of consensus in which there is a single special proposer whose proposed value needs to be chosen quickly. I realized there is a “killer app”–namely, distributed transaction commit. Instead of regarding transaction commit as one consensus problem that chooses the single value commit or abort, it could be presented as a set of separate consensus problems, each choosing the commit/abort desire of a single participant. Each participant then becomes the special proposer for one of the consensus problems. This led to what I call the Paxos Commit algorithm. It is a fault-tolerant (non-blocking) commit algorithm that I believed had fewer message delays in the normal (failure-free) case than any previous algorithm. I later learned that an algorithm published by Guerraoui, Larrea, and Schiper in 1996 had the same normal-case behavior.

Several months later, Jim Gray and I got together to try to understand the relation between Paxos and the traditional Two-Phase Commit protocol. After a couple of hours of head scratching, we figured out that Two-Phase Commit is the trivial version of Paxos Commit that tolerates zero faults. That realization and several months of procrastination led to this paper, which describes the Two-Phase Commit and Paxos Commit algorithms and compares them. It also includes an appendix with TLA+ specifications of the transaction-commit problem and of the two algorithms.

‚Äč