How Microsoft is transforming its own patch management with Azure

Sep 20, 2019   |  

At Microsoft Core Service Engineering and Operations (CSEO), patch management is key to our server security practices. That’s why we set out to transform our operational model with scalable DevOps solutions that still maintain enterprise-level governance. Now, CSEO uses Azure Update Management to patch tens of thousands of our servers across the global Microsoft ecosystem, both on premises and in the cloud, in Windows and in Linux.

With Azure Update Management, we have a scalable model that empowers engineering teams to take ownership of their server updates and patching operations, giving them the agility they need to run services according to their specific business needs. We’ve left our legacy processes behind and are meeting our patch compliance goals month after month since implementing our new, decentralized DevOps approach. Here’s an overview of how we completed the transformation.

[For more details please view our IT Showcase webinar, How Microsoft uses Azure to cloud-power the patching and config management in their IT Datacenter.]

The journey to Azure Update Management

Back in January 2017, the CSEO Manageability Team started transitioning away from the existing centralized IT patching service and its use of Microsoft System Center Configuration Manager. We planned a move to a decentralized DevOps model to reduce operations costs, simplify the service and increase its agility, and enable the use of native Azure solutions.

CSEO was looking for a solution that would provide overall visibility and scalable patching while also enabling engineering teams to patch and operate their servers in a DevOps model. Patch management is key to our server security practices, and Azure Update Management provides the feature set and scale that we needed to manage server updates across the CSEO environment.

Azure Update Management can manage Linux and Windows, on premises and in cloud environments, and provides:

  • At-scale assessment capabilities
  • Scheduled updates within specified maintenance windows
  • Logging to troubleshoot update failures

We also took advantage of new advanced capabilities, including:

  • Maintenance windows that distinguish and identify servers in Azure based on subscriptions, resource groups, and tags
  • Pre/post scripts that run before and after the maintenance window to start turned-off servers, patch them, and then turn them off again
  • Server reboot options control
  • Include/exclude of specific patches
A graphic showing the solution architecture for Azure Update Management implementation. In the upper left, the enterprise workspace is shown, connecting to business units below and organizations to the right. In the upper right, the enterprise section includes security monitoring, patch governance, and data collection. The lower right is the DevOps section, which includes patch deployment, change configuration tracking, application monitoring and alerts, and performance analytics.
This graphic demonstrates the solution architecture for our complex CSEO environment

Completing that transformation with Azure Update Management required the Manageability Team to achieve three main goals:

  • Enhance compliance reporting to give engineering teams a reliable and accurate “source of truth” for patch compliance.
  • Ensure that 95 percent of the total server population in the datacenter would be compliant for all vulnerabilities being scanned, enabling a clean transfer of patching duties to application engineering teams.
  • Implement a solution that could patch at enterprise scale.

CSEO enhanced reporting capabilities by creating a Power BI report that married compliance scan results with the necessary configuration management database details. This provided a view on both current and past patch cycle compliance, setting a point-in-time measure within the broader context of historic trends. Engineers were now able to quickly and accurately remediate without wasting time and resources.

The report also included 30-day trend tracking and knowledge base (KB)-level reporting. The Manageability Team also gathered feedback from engineering groups to make dashboard enhancements like adding pending KB numbers on noncompliant servers and information about how long a patch was pending on a server.

We focused on achieving that 95 percent key performance indicator by “force remediating” older vulnerabilities first by upgrading or uninstalling older applications. With Configuration Manager consistently landing patches each cycle, engineering teams began to consistently meet the 95 percent goal.

Finally, as a native Azure solution available directly through the Azure portal, Azure Update Management provided the flexibility and features needed for engineering teams to remediate vulnerabilities while satisfying these conditions at scale.

Decoding our transformation

In the past, “white glove” application servers required additional coordination or extra steps during patching, like removing a server from network load balancing or stopping a service before patches could be applied. The traditional system typically required a patching team to coordinate patch deployment with the team that owned the application, all to ensure that the application would not be affected by recently installed patches.

We implemented a number of changes to transition smoothly from that centralized patching service to using Azure Update Management as our enterprise solution. Our first step was to deliver demos to help engineering teams learn to use Azure Update Management. These sessions covered everything from the prerequisites necessary to enable the solution in Azure to how to schedule servers, apply patches, and troubleshoot failures.

The Manageability Team also drew from its own experience getting started with Azure Update Management to create a toolkit to help engineering teams make the same transition. The toolkit provided prerequisite scripts, like adding the Microsoft Monitoring Agent extension and creating an Azure Log Analytics workspace. It also contained a script to set up Azure Security Center when teams had already created default workspaces; since Azure Update Management supports only one automation account and Log Analytics workspace, the script cleaned up the automation account and linked it to the workspace used for patching.

Next, the Manageability Team took on proving scalability across the datacenter environment. The goal was to take a subset of servers from the centralized patching service in Configuration Manager and patch them through Azure Update Management. They created Scheduled Deployments within the Azure Update Management solution that used the same maintenance windows as those used by Configuration Manager. After validating the servers’ prerequisites, they moved the servers into the deployments so that during that maintenance window, Azure Update Management was patching the servers instead of Configuration Manager.

With that successful scalability exercise completed, the final step was to turn off Configuration Manager as the centralized service’s “patching engine.” CSEO had set a specific deadline for this transformation, and right on time the team turned off the Software Update Manager policy in Configuration Manager. This ensured that Configuration Manager would no longer be used for patching activities, but would still be available for other functionality.

After the transition was complete, the Manageability Team monitored closely to ensure that decentralization did not negatively affect compliance. In almost every month since the transition, the CSEO organization has consistently achieved the 95 percent compliance goal.

[Learn more about how we’re tackling Microsoft Azure governance inside Microsoft.]

Refining update management

We’re now hard at work on the next evolution in our Azure Update Management journey to even further optimize operational costs, accelerate patch compliance, and improve the end-to-end patching experience. Most recently, we’ve implemented automated notifications that send emails and create tickets when servers are not compliant, so that teams can quickly remediate.

CSEO will continue to build tools and automation that improve the patching experience and increase compliance. We’re evaluating, adapting, and providing our engineering teams with guidance as new features are released into the Azure Update Management service.

Learn more about governance inside CSEO here: Enabling enterprise governance in Azure.

For more details please view our IT Showcase webinar: How Microsoft uses Azure to cloud-power the patching and config management in their IT Datacenter.

To learn more about Azure Update Management, visit Azure Docs.

 

Tags: , , , , , , , ,