Failures and Other Challenges of Big-Data Analytics

  • Magdalena Balazinska | University of Washington

An important challenge faced by today’s big-data analytics systems is fault-tolerance: When running a parallel query at large scale, some form of failure is likely to occur during execution. Existing systems typically take one of two radically different strategies to handle failures: restart entire queries or materialize the output of each operator and restart only failed operator partitions. The former approach adds significant overhead when a failure occurs, while the latter adds overhead at runtime and typically introduces global synchronization barriers.

In this talk, we present FTOpt, a new approach for making online, parallel query plans fault-tolerant: FTOpt provides intra-query fault-tolerance without blocking. Additionally, it does so by using different fault-tolerance techniques at different operators within a query plan. Enabling each operator to use a different fault-tolerance strategy leads to a space of fault-tolerance plans amenable to cost-based optimization. FTopt comprises a protocol for mixing-and-matching fault-tolerance techniques within a single query plan and an optimizer for selecting the technique to use in order to minimize the expected processing time with failures for the entire query. Experiments show that with as little as one failure, the choice of fault-tolerance approach can result in 70% difference in query runtimes, that often hybrid query plans lead to the best performance, and that our optimizer is able to select a winning plan.

In addition to FTOpt, we will also present a broad overview of other research challenges tackled by the ongoing Nuage, CQMS, and Data Ecoytem projects at the University of Washington.

Speaker Details

Magdalena Balazinska is an Assistant Professor in the department of Computer Science and Engineering at the University of Washington. Magdalena’s research interests are broadly in the fields of databases and distributed systems. Her current research focuses on big-data analytics, sensor and scientific data management, and cloud computing. Magdalena holds a PhD from the Massachusetts Institute of Technology (2006). She is a Microsoft Research New Faculty Fellow (2007), received an NSF CAREER Award (2009), a 10-year most influential paper award (2010), an HP Labs Research Innovation Award (2009-2011), a Rogel Faculty Support Award (2006), a Microsoft Research Graduate Fellowship (2003-2005), and several best-paper awards (2002, 2010, and 2011).

    • Portrait of Jeff Running

      Jeff Running

Series: Microsoft Research Talks