Distributed Speculative Execution: A Programming Model for Reliability and Increased Performance


April 18, 2007


Cristian Tapus


Center for Advanced Computing Research (CACR), Pasadena, CA


Reliability and fault-tolerance are key issues in software design and development. In this talk, I will present the use of speculations, a form of distributed transactions, to improve the reliability and fault tolerance of distributed systems and provide better performance. A speculation is defined as a computation that is based on an assumption whose validation is performed concurrently with the computation. If the assumption is found to be false, the computation is aborted and the state of the program is rolled back; if the assumption is validated, the results of the computation are committed. The main difference between a speculation and a transaction is that a speculation is not isolated. A speculative computation may send and receive messages, and it may modify shared objects. As a result, processes that share those objects or receive speculative messages may be absorbed into the speculation. Speculations provide three main advantages over the traditional programming model: (1) they may be used to reduce the error recovery code from programs, making it easier to reason about the computations; (2) they provide an exceptions-like mechanism that extends to a distributed environment; (3) they allow for optimistic execution, which may increase the performance of programs. In this talk, I will first present the syntax and an operational semantics for nested speculative execution, and then describe the challenges we encountered when implementing support for speculative execution as a kernel level service. In conclusion I will present some applications of this programming model and discuss future directions of research.


Cristian Tapus

Cristian Tapus is a postdoctoral scholar at the Center for Advanced Computing Research (CACR) in Pasadena, CA. He holds a Ph.D. degree from the Computer Science Department at Caltech, a M.S. in computer science from both University of Maryland in College Park and Caltech, and a B.S.degree in Engineering and Applied Sciences from Caltech. His research focuses on designing and implementing safe, robust and reliable distributed systems and providing new programming models for such environments.