Practical Byzantine Fault Tolerance

May 30, 2012
Miguel Castro | MSRC

This lecture is about implementing Byzantine fault tolerant state machine replication. A Byzantine faulty replica can behave arbitrarily, for example, it may be controlled by an attacker, whereas algorithms like Paxos assume that faulty replicas fail by stopping. Byzantine fault tolerant state machine replication works correctly if less than a third of the replicas are faulty. The lecture starts by discussing assumptions in replication algorithms and how they can be invalidated by an attacker. Then it describes the Castro-Liskov algorithm, explains how it works, and presents a number of important optimizations. Next it discusses implementation issues including a methodology for reusing existing code to implement replicated state machines, and a Byzantine fault tolerant replicated file system built using this methodology. The performance of this implementation is discussed next followed by a brief discussion of other work in Byzantine fault tolerance.

Speaker Details

I work at Microsoft Research on distributed systems, networking, and security. Before joining MSR, I was a graduate student in the Programming Methodology Group at the MIT Laboratory for Computer Science working on object-oriented databases and Byzantine fault tolerance.