Our group’s main mission is to improve the cost efficiency of Microsoft’s online services and datacenters.  We pursue this mission by working closely with the company’s product groups to (1) propose and lead joint projects that improve efficiency, and (2) do research on potential future efficiency improvements.

Some of our main accomplishments have been:

  • Our power emergency management system (described in our ISCA 2021 paper), which allows datacenters to allocate all of their reserve/redundant power and host more servers, went into production in March 2021.
  • Our per-VM power capping software (described in our ATC 2021 paper) went into production in October 2020.
  • Our hybrid policy for managing cold starts in serverless platforms (described in our ATC 2020 paper) went into production in Azure Functions in June 2020.
  • Harvest VMs v1 for harvesting unallocated cores and Harvest Hadoop, our modification of YARN and HDFS to benefit from Harvest VMs, (described in our OSDI 2020 paper) went into production in November 2019.
  • Our power capping and oversubscription software went into production in July 2018.
  • Our tail latency mitigation techniques for HDFS (described in our EuroSys 2019 paper) went into production in June 2018.
  • Resource Central, our ML and prediction-serving system for cloud platforms (described in our SOSP 2017 paper), went into production in March 2018.
  • Router-Based HDFS Federation, our system for transparently scaling HDFS to datacenter sizes (described in our ATC 2017 paper), went into production in June 2017.
  • CPU blind isolation for harvesting spare CPU cycles (described in our ATC 2018 paper with the DMX group at MSR) went into production in August 2016.
  • Perflite, a tool for VM utilization analysis and optimization built from our Floodlight tool, went into production in February 2016.
  • Our resource-harvesting YARN/HDFS stack and HDFS data placement algorithm for harvesting spare storage (described in our OSDI 2016 paper) went into production in January 2016.
  • Our analysis of disk reliability (described in our FAST 2016 award paper) prompted the adoption of a new ambient control policy for Microsoft’s free-cooling datacenters starting in 2015.

None of these successes would not have been possible without our close partnership with teams in Azure, Bing, CO+I, AHSI, and Windows/Hyper-V.

 

People

People

Portrait of Ricardo Bianchini

Ricardo Bianchini

Distinguished Engineer

Portrait of Kapil Arya

Kapil Arya

Senior Research SDE

Portrait of Daniel Berger

Daniel Berger

Senior Researcher

Portrait of Anand Bonde

Anand Bonde

Senior Research SDE

Portrait of Gohar Irfan Chaudhry

Gohar Irfan Chaudhry

Research SDE 2

Portrait of Esha Choukse

Esha Choukse

Researcher

Portrait of Dan Crankshaw

Dan Crankshaw

Senior RSDE

Portrait of Sameh Elnikety

Sameh Elnikety

Principal Researcher

Portrait of Felipe Vieira Frujeri

Felipe Vieira Frujeri

Senior Applied Scientist

Portrait of Íñigo Goiri

Íñigo Goiri

Principal Research SDE

Portrait of Celine Irvene

Celine Irvene

Research SDE 2

Portrait of Alok Kumbhare

Alok Kumbhare

Sr. Research SDE

Portrait of Pedro Las-Casas

Pedro Las-Casas

Research SDE 2

Portrait of Pulkit Misra

Pulkit Misra

Senior Research SDE

Portrait of Stanko Novakovic

Stanko Novakovic

Senior Research Software Engineer

Portrait of Rafael da Silva

Rafael da Silva

Senior Research SDE.

Portrait of Pantea Zardoshti

Pantea Zardoshti

Research SDE 2

Portrait of Sam Whitlock

Sam Whitlock

Research SDE 2