International Conference on Dependable Systems and Networks (DSN 2004) |
A system that can tolerate a single non-Byzantine failure requires three processors. It has long been realized that only two of those processors need to maintain the system state, but the third processor must take part in every decision to maintain consistency. Mike Massa, at the time an engineer at Microsoft, observed that if we weaken the fault-tolerance guarantee, then the third processor needs to be used only in the case of failure or repair of one of the other two processors. The third processor can then be a less powerful machine or a process run occasionally on a computer devoted to other tasks. I generalized his idea to a variation of the Paxos algorithm of  called Cheap Paxos that tolerates up to f failures with f+1 main processors and f auxiliary ones. A paper on this algorithm was rejected from the PODC and DISC conferences. Most of the referees thought that it just presented the old idea that only the main processors need to maintain the system state, not realizing that it differed from the old approach because the remaining f processors need not take part in every decision. One review contained a silly assertion that it was easy to solve the problem in a certain way. When trying to prove his or her assertion false, I discovered a somewhat simpler version of Cheap Paxos that achieved the same result as the original version. (The new algorithm wasn’t at all what the referee said should be done, which was impossible.) This paper describes the simpler algorithm. The original algorithm actually has some advantage over the new one, but it remains unpublished.