In multi-tenant cloud environments, the absence of strict network performance guarantees leads to unpredictable job execution times. To address this issue, recently there have been several proposals on how to provide guaranteed network performance. These proposals, however, rely on computing resource reservation schedules a priori. Unfortunately, this is not practical in today’s cloud environments, where application demands are inherently unpredictable, e.g., due to differences in the input datasets or phenomena such as failures and stragglers.

To overcome these limitations, we designed Kraken, a system that allows tenants to dynamically request and update minimum guarantees for both network bandwidth and compute resources at runtime. Unlike previous work, Kraken does not require prior knowledge about the resource needs of the tenants’ applications but allows tenants to modify their reservation at runtime. Kraken achieves this through an online resource reservation scheme which comes with provable optimality guarantees.

In this paper, we motivate the need for dynamic resource reservation schemes, present how this is provided by Kraken, and evaluate Kraken via extensive simulations.