Our group’s main mission is to improve the cost efficiency of Microsoft’s online services and datacenters. We pursue this mission by working closely with the company’s product groups to (1) propose and lead joint projects that improve efficiency, and (2) do research on potential future efficiency improvements.
Some of our main accomplishments have been:
- Our power capping and oversubscription software went into production in July 2018.
- Our tail latency mitigation techniques for HDFS (described in our EuroSys 2019 paper) went into production in June 2018.
- Resource Central, our ML and prediction-serving system for cloud platforms (described in our SOSP 2017 paper), went into production in March 2018.
- Router-Based HDFS Federation, our system for transparently scaling HDFS to datacenter sizes (described in our ATC 2017 paper), went into production in June 2017.
- CPU blind isolation for harvesting spare CPU cycles (described in our ATC 2018 paper) went into production in August 2016.
- Perflite, a tool for VM utilization analysis and optimization built from our Floodlight tool, went into production in February 2016.
- Our resource-harvesting YARN/HDFS stack and HDFS data placement algorithm for harvesting spare storage (described in our OSDI 2016 paper) went into production in January 2016.
- Our analysis of disk reliability (described in our FAST 2016 award paper) prompted the adoption of a new ambient control policy for Microsoft’s free-cooling datacenters starting in 2015.
None of these successes would not have been possible without our close partnership with teams in Azure, Bing, CO+I, CSI, and Windows/Hyper-V.