Toward Automated Debugging for Datacenter Applications

  • Gautam Altekar | University of California

Debugging data-intensive distributed applications running in a datacenter (“datacenter applications”) is complex and time-consuming. Developers wish they had a way to debug failed executions with little human effort, but unfortunately no such tool exists today. In this talk, I will present ADDA – a system that reduces, to a significant extent, the manual effort needed to debug datacenter applications. Specifically, ADDA enables developers to perform powerful automated analyses (like global invariant checks and distributed data flow) on the executions of large-scale, distributed applications, thereby precluding the need to manually search and reason through those executions. The key challenge in building ADDA is that of performing such heavyweight analysis while incurring little in-production overhead. To address this, ADDA harnesses deterministic replay technology to offload expensive analyses to an offline replay execution. With the power of deterministic replay, ADDA incurs low in-production overheads (at ~15%) and automates, in large part, the debugging of real-world failures in applications like Hypertable and Cassandra. To conclude the talk, I will give a demo of ADDA in action, and will argue that ADDA brings us one step closer to the holy grail of fully-automated datacenter debugging.

Speaker Details

Gautam Altekar is a Ph.D candidate at the University of California, Berkeley. Advised by Professor Ion Stoica, Gautam’s research interests lie primarily in software reliability, and in particular in the automated debugging of large-scale distributed applications. He enjoys combining techniques from the operating systems, distributed systems, and program verification fields to build novel yet practical software tools that make life easier for the average application developer.

    • Portrait of Jeff Running

      Jeff Running