Autopilot: Automatic Data Center Management
Operating Systems Review | , Vol 41: pp. 60-67
Microsoft is rapidly increasing the number of large-scale web services that it operates. Services such as Windows Live Search and Windows Live Mail operate from data centers that contain tens or hundreds of thousands of computers, and it is essential that these data centers function reliably with minimal human intervention. This paper describes the first version of Autopilot, the automatic data center management infrastructure developed within Microsoft over the last few years. Autopilot is responsible for automating software provisioning and deployment; system monitoring; and carrying out repair actions to deal with faulty software and hardware. A key assumption underlying Autopilot is that the services built on it must be designed to be manageable. We also therefore outline the best practices adopted by applications that run on Autopilot.