In this project, we are investigating the use of Machine Learning (ML) for improving computer systems (vs mimicking human behavior) and, in particular, cloud platforms. As a first step in this direction, we built Resource Central, a general ML and prediction-serving system that we have deployed in all Azure Compute clusters world-wide. It trains ML models offline and uses them to produce predictions online. The predictions can be used by other Azure components to improve resource, performance, and availability management. For example, the server defragmentation engine and the VM scheduler are two of the platform components that already use predictions (e.g., VM lifetime, VM migration blackout/brownout times) from Resource Central in production. We have recently expanded Resource Central’s scope to Azure Networking, and our goal is to eventually integrate management activities from Azure Storage and Azure Data as well.
Resource Central is a close collaboration between MSR and Azure Compute.
People
People
Daniel S. Berger
Senior Researcher
Ricardo Bianchini
Distinguished Engineer
Anand Bonde
Senior Research SDE
Pedro Las-Casas
Research SDE 2
Rafael da Silva
Senior Research SDE.