We consider the problem of consensus in the challenging classic model. In this model, the adversary is adaptive; it can choose which processors crash at any point during the course of the algorithm. Further, communication is via asynchronous message passing: there is no known upper bound on the time to send a message from one processor to another, and all messages and coin flips are seen by the adversary.
We describe a new randomized consensus protocol for this model with expected message complexity O(n² log² n) when fewer than n/2 processes may fail by crashing. This is an almost-linear improvement over the best previously known protocol, and within logarithmic factors of a known Ω(n²) message lower bound. The protocol further ensures that no process sends more than O(n log³ n) messages in expectation, which is also within logarithmic factors of optimal. We also present a generalization of the algorithm to an arbitrary number of failures t, which uses expected O(nt + t² log² t) total messages.
Our approach is to build a message-efficient, resilient mechanism for aggregating individual processor votes, implementing a message-passing weak shared coin. Roughly, a processor first announces its votes to small groups, then propagates them to increasingly larger groups as it generates more and more votes. The main technical difficulty is bounding the impact of votes that are still “in flight” (generated, but not fully propagated) on the final outcome of the shared coin, and is especially challenging since the protocol bounds the number of messages that an individual process might send or receive. Overall, our paper builds on tools and techniques from the shared-memory literature to (almost-optimally) solve a classic problem in the message-passing model.