Memory-Harvesting VMs in Cloud Platforms

Alex Fuerst; Stanko Novakovic; Íñigo Goiri; Gohar Irfan Chaudhry; Prateek Sharma; Kapil Arya; Kevin Broas; Eugene Bak; Mehmet Iyigun; Ricardo Bianchini

Memory-Harvesting VMs in Cloud Platforms

Alex Fuerst ,
Stanko Novakovic ,
Íñigo Goiri ,
Gohar Irfan Chaudhry ,
Prateek Sharma ,
Kapil Arya ,
Kevin Broas ,
Eugene Bak ,
Mehmet Iyigun ,
Ricardo Bianchini

Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) | March 2022

Organized by ACM

Download BibTex

Cloud platforms monetize their spare capacity by renting “Spot” virtual machines (VMs) that can be evicted in favor of higher-priority VMs. Recent work has shown that resource-harvesting VMs are more effective at exploiting spare capacity than Spot VMs, while also reducing the number of evictions. However, the prior work focused on harvesting CPU cores while keeping memory size fixed. This wastes a substantial monetization opportunity and may even limit the ability of harvesting VMs to leverage spare cores. Thus, in this paper, we explore memory harvesting and its challenges in real cloud platforms, namely its impact on VM creation time, NUMA spanning, and page fragmentation. We start by characterizing the amount and dynamics of the spare memory in Azure. We then design and implement memory-harvesting VMs (MHVMs), introducing new techniques for memory buffering, batching, and pre-reclamation. To demonstrate the use of MHVMs, we also extend a popular cluster scheduling framework (Hadoop) and a FaaS platform to adapt to them. Our main results show that (1) there is plenty of scope for memory harvesting in real platforms; (2) MHVMs are effective at mitigating the negative impacts of harvesting; and (3) our extensions of Hadoop and FaaS successfully hide the MHVMs’ varying memory size from the users’ data-processing jobs and functions. We conclude that memory harvesting has great potential for practical deployment and users can save up to 93% of their costs when running workloads on MHVMs.