Azure Systems Research

Cloud systems innovation at the core of Azure


Azure Systems Research is a newly created research group that brings forward-looking systems research directly into the core of Azure. The group is seeded from the Cloud Efficiency team, which migrated from the Systems Research Group at Microsoft Research for a closer integration with Azure.

Our group’s main mission is to improve the cost efficiency of Microsoft’s online services and datacenters. We pursue this mission by working closely with the company’s product groups to (1) propose and lead joint projects that improve efficiency, and (2) do research on potential future efficiency improvements.


Some of our main accomplishments have been:

  • Our power emergency management system (described in our ISCA 2021 paper), which allows datacenters to allocate all of their reserve/redundant power and host more servers, went into production in March 2021.
  • Our per-VM power capping software (described in our ATC 2021 paper) went into production in October 2020.
  • Our hybrid policy for managing cold starts in serverless platforms (described in our ATC 2020 paper) went into production in Azure Functions in June 2020.
  • Harvest VMs v1 for harvesting unallocated cores and Harvest Hadoop, our modification of YARN and HDFS to benefit from Harvest VMs, (described in our OSDI 2020 paper) went into production in November 2019.
  • Our power capping and oversubscription software went into production in July 2018.
  • Our tail latency mitigation techniques for HDFS (described in our EuroSys 2019 paper) went into production in June 2018.
  • Resource Central, our ML and prediction-serving system for cloud platforms (described in our SOSP 2017 paper), went into production in March 2018.
  • Router-Based HDFS Federation, our system for transparently scaling HDFS to datacenter sizes (described in our ATC 2017 paper), went into production in June 2017.
  • CPU blind isolation for harvesting spare CPU cycles (described in our ATC 2018 paper with the DMX group at MSR) went into production in August 2016.
  • Perflite, a tool for VM utilization analysis and optimization built from our Floodlight tool, went into production in February 2016.
  • Our resource-harvesting YARN/HDFS stack and HDFS data placement algorithm for harvesting spare storage (described in our OSDI 2016 paper) went into production in January 2016.
  • Our analysis of disk reliability (described in our FAST 2016 award paper) prompted the adoption of a new ambient control policy for Microsoft’s free-cooling datacenters starting in 2015.

None of these successes would not have been possible without our close partnership with teams in Azure, Bing, CO+I, AHSI, and Windows/Hyper-V.