Omega and Firmament: flexible, scalable and heterogeneity-aware cluster scheduling

  • Malte Schwarzkopf | University of Cambridge

Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. Omega is a novel architecture for running multiple cluster schedulers atop a shared infrastructure, addressing these needs using parallelism, shared state, and lock-free optimistic concurrency.

In this talk, I will compare the Omega approach to existing cluster scheduler designs, and evaluate the impact of scheduler interference. Using real Google workloads, I will show that a shared-state approach is viable, and that it enables custom scheduling logics, such as a MapReduce scheduler opportunistically making use of spare resources.

I will also describe some newer, on-going work: the Firmament scheduler, which adapts a Quincy-style min-cost max-flow optimization to make good scheduling decisions in the face of heterogeneous cluster resources and workload interference.

Speaker Details

Malte Schwarzkopf is a final-year PhD student at the University of Cambridge Computer Laboratory. His interests are in the areas of distributed systems and operating systems, and especially where they overlap. After being involved in the CIEL project (at Cambridge) and working on the Omega cluster scheduler (at Google), he is currently leading two related projects at the Computer Laboratory: the Firmament heterogeneity-aware scheduler and DIOS, a revisit of distributed operating systems in the context of modern data centers.

    • Portrait of Jeff Running

      Jeff Running

Series: Microsoft Research Talks