Printer Friendly Version      Send     
Click to Rate and Give Feedback
TechNet
TechNet Library
Server Security Patch Management at Microsoft

Technical White Paper

Published: February 4, 2004

Sharing the Microsoft IT Experiences

Download

Download Technical White Paper, 1 MB, Microsoft Word file

Download IT Value Card, 203 KB, Microsoft Word file

PowerPoint PowerPoint Presentation, 714 KB, Microsoft PowerPoint file

Situation

Solution

Benefits

Products & Technologies

Organizations that cannot determine and maintain a known level of patch management for operating systems and application software might have a number of security vulnerabilities that, if exploited, could lead to loss of revenue and intellectual property.

Minimizing the threat of vulnerabilities requires organizations to have properly configured systems, to use the latest software, and to install the recommended software updates. Assessing and maintaining the integrity of software in a networked environment through a well-defined patch management program is a key first step toward successful information security. The IT group inside Microsoft uses Microsoft Systems Management Server (SMS) 2003 as the primary tool in its server patch management process.

  • Automated deployment of security updates and applications
  • Central reporting and administration
  • More accurate and efficient update management
  • Reduction in manual effort to patch servers
  • Microsoft Windows Server 2003
  • Microsoft SQL Server 2000 SP3a
  • Systems Management Server 2003
  • Microsoft Baseline Security Analyzer (MBSA) version 1.2
  • Change management database (IT Configuration DB)

Executive Summary

Assessing and maintaining the integrity of software in a networked environment through a well-defined patch management program is a key first step toward successful information security.

Patch management is a process that gives organizations control over the deployment and maintenance of interim software releases into their production environments. It helps organizations maintain operational efficiency and effectiveness, and it helps improve the security and stability of the production environment.

The information technology group within Microsoft (known as Microsoft IT) uses Microsoft® Systems Management Server (SMS) 2003 to:

  • Manage the application deployment process.
  • Improve asset management for both hardware and software.
  • Better manage mobile clients within the Microsoft network.
  • Help manage the deployment of security updates across the enterprise.

This white paper focuses on Microsoft IT's early adoption of SMS 2003 to assist with server patch management, including the deployment of security updates, in its production environment. The technical elements covered in this paper include:

  • The organization and structure of the data-center network and the levels of support to manage data-center servers.
  • The server patching architecture at Microsoft.
  • The administrative structure required to support the process of managing security updates for servers.
  • The key phases of the patch management process: monitoring for security bulletins and updates, determining the risk level, testing an update, deploying an update, and checking reports on the success of the deployment.
  • A comparison of the different timetables for standard and emergency updates.
  • The lessons learned and best practices for security patch management.

Customers frequently ask Microsoft IT about the methods employed and lessons learned when Microsoft software is deployed internally. This white paper provides an inside look at how Microsoft IT manages security updates at the server level with SMS 2003, including a discussion of the procedural and staffing decisions necessary to ensure efficient server patch management.

For information about how Microsoft IT manages both security updates and application updates for client (desktop) computers, see the IT Showcase white paper Desktop Patch Management with Microsoft Systems Management Server 2003.

This paper is written for enterprise technical decision makers who want to take advantage of the patch management features that SMS 2003 provides. This paper is based on Microsoft IT operational experience as an early adopter. It is not intended to serve as a procedural guide. Each enterprise environment is composed of unique circumstances; therefore, each organization should adapt the plans and lessons learned described in this paper to meet its specific needs.

Readers of this paper should be familiar with the Microsoft Operations Framework (MOF) Process, Team, and Risk models. For more information about these models, see http://www.microsoft.com/business/reducecosts/efficiency/manageability/default.mspx or http://www.microsoft.com/technet/itsolutions/cits/mo/mof/default.mspx.

Note
   
This paper uses the term patch management to label the process of fixing software vulnerabilities on computers in a general sense. The term update refers to the object of the process—the package that is deployed in patch management.

Microsoft IT's patch management framework follows Microsoft Solutions for Management (MSM) recommendations. The framework consists of the following steps:

  1. Assess the environment to be patched on an ongoing basis by understanding the baseline of system security, constantly reviewing the patch management architecture, managing SMS client breadth and health, and conducting inventories.
  2. Identify new patches, determine their relevance to the environment, and verify patch authenticity by installing each one on an isolated system, and determine enforcement time frame.
  3. Evaluate and plan patch deployment to ensure that patch testing, risk assessment, and patch release processes are all in place.
  4. Deploy the patch, which includes distributing and installing the patch, monitoring and reporting on progress and success or failures of the patch, dealing with exceptions, and conducting a review of the deployment to continue to evolve the patch management process.
Note   More information about this framework is available on the MSM website. See Chapter 2, "Solution Guidance," in Patch Management Using Microsoft Systems Management Server 2003.

Server patch management falls under three discrete MOF processes, shown in Table 1.

Table 1   MOF Processes

Process Process objective
Change management Avoidance of change-induced service disruptions
Release management Coordination and installation of correct, authorized, and tested software and hardware
Configuration management Storage and retrieval of up-to-date configuration information, in support of activities for other processes

Introduction

The Microsoft IT group is responsible for managing IT services and a challenging computing environment for more than 55,000 employees and more than 300,000 devices that span over 400 sites worldwide. Over 300 of the sites are sales and marketing offices distributed in major worldwide cities. IT-managed infrastructure exists at over 200 of those sites.

Because Microsoft is a large enterprise that develops and markets software, the Microsoft IT infrastructure is much larger than is typical of other corporations with a similar number of employees, contractors, and vendors. For example, Microsoft has two to three times more computers and other devices (such as Smartphones and Pocket PC devices) than personnel. Microsoft IT manages more than 120,000 desktop computers and portable computers spread among the production, product development, test, and support organizations.

Microsoft IT consists of more than 3,500 staff members that are responsible for managing the IT utility for the company. In addition, Microsoft IT plays a key role in helping the company meet its main business objective of software development and marketing. Microsoft IT serves as an early adopter of new Microsoft software, such as Microsoft Windows Server™ 2003, Microsoft Office 2003, and Systems Management Server 2003. This process is known internally as "eating our own dog food."

The early deployment of technology and continual growth at Microsoft result in a highly dynamic environment. The environment houses more than 6,000 servers that provide essential services. These services include 1,600 line-of-business applications that range from a single SAP R/3 instance to specialized departmental or even workgroup applications for groups such as research, product support, and product development in four different Active Directory® directory service forests.

Note
   
This paper is based on the infrastructure of the primary production forest and excludes references to the extranet, research, product development, testing, and support infrastructures, except where noted.

Servers in the primary production data center provide many mission-critical functions with service level agreements (SLAs) for uptime greater than 99.9 percent. Minimizing unplanned server downtime is a key operational and server patch management requirement. Strictly managing the timing for planned downtime is also a key requirement, especially for the many clustered servers. In addition, Microsoft IT manages to a goal of around 200 servers per administrator and budgets no additional headcount for the rising trend in the number of server updates.

Additional challenges in the Microsoft security environment include:

  • As many as 2,500 unique attacks, probes, and scans occur on a daily basis.
  • Each month, Microsoft probes, scans, and quarantines over 125,000 virus-infected e-mail messages.
  • Unique IT environments for product development, testing, support, and research require special security.
  • Most Microsoft employees are highly technology literate and routinely explore the limits of the tools available to them in order to improve product quality. For example, more than 95 percent of Microsoft employees have local administrator rights to their desktops. Some employees even run server operating systems on their desktop computers for various development, testing, and product support purposes. For security patch management purposes, these computers are managed the same way as client (desktop) computers.

This combination of factors—an evolving security landscape full of potential vulnerabilities operating across a large, dynamic, and demanding IT environment—presents a challenging array of variables for the server management IT function to manage.

In addition, making sure that an update—especially a security update—reaches only its intended targets is absolutely essential so that conflicts do not arise between the update and other software versions for which it was not intended. Microsoft IT requires that patch installation must be able to fix the problem without creating side effects or negative interactions.

To help address these issues, Microsoft IT turned to SMS 2003 to help manage the computing environment at Microsoft. SMS 2003 provides Microsoft IT with:

  • Inventory functions to determine how many computers have been deployed, their locations, their roles, and the software applications and updates that have been installed.
  • Scheduling functions that allow scheduled deployment for updates outside regular working hours, or at a time that has the least impact on business operations.
  • The Distribution Software Update Wizard, which enables administrators to rapidly select and deploy software distributions, such as security updates, to specific groups of computers, such as servers.
  • Status reporting that enables patch administrators to monitor the progress and assess the success of installation.

The server patching process prior to SMS 2003 was labor intensive. One or more Microsoft IT professionals had to write a script to distribute the update. The script had to reach only the computers for which it was intended, determine whether the update had already been installed on each target computer, and then install the update as needed. A new script had to be written for each update distribution. After the update deployment script was written, it had to be tested for integrity, and then tested again on a select representative server distribution to ensure that it would work. Before SMS 2003, Microsoft IT had no single mechanism to assess and gather inventories of various servers, what operating system versions and service packs they were running, and whether they had been patched.

Note
   
For more detailed procedural information about updating servers with SMS 2003, based on the Microsoft IT group experience, see Microsoft Solution Accelerator for Patch Management Using Microsoft Systems Management Server 2003.

Business Benefits of Server Patch Management with SMS 2003

Automated Security Update and Application Deployment

With the ability of SMS 2003 to enforce security update installation, Microsoft IT can ensure that servers are updated within prescribed timeframes. By creating server collections coordinated with the server's time zones and optimum times for automatic reboots, SMS 2003 enables the Microsoft IT patch administrator to automatically update the servers in the data center with minimal impact to the business.

Central Reporting and Administration

SMS 2003 centralizes the reporting and administration of complex networks of computers by consolidating information about network assets into a single database, including servers, their locations, and the software running on them. This provided Microsoft IT a unified and up-to-date repository of the inventory of Microsoft's IT infrastructure, combined with a centralized, unified set of tools for monitoring, administering, and updating any combination of servers in the network. This centralization also establishes a clear communication path that helps software deployment and security updates reach the intended servers. It also helps ensure that the servers communicate the results of the deployment attempt back to the SMS server.

More Accurate and Efficient Patch Management

To meet the challenge of an increasing number of updates, SMS 2003 offers a number of features to enable deploying more updates by fewer administrators in less time. The inventorying features enable an administrator to categorize computers into any number of groups based, for example, on what server tasks they perform, what OS and service packs they run, their physical location, and which time zone they share.

Reduction in Manual Effort to Deploy Updates

SMS 2003 offers many automated tools to group targeted computers easily and to deploy software updates. In addition, integration with Active Directory allows IT administrators to re-use information in that repository for targeting users and computers. With SMS 2003 Microsoft IT spent less time writing scripts, testing scripts, and testing deployments. For example, before adopting SMS 2003 Microsoft IT needed three to five IT professionals per shift, 24 hours per day for update management. As of this writing, using SMS 2003, one person per shift is doing the same job.

Background

The server patch management process at Microsoft is composed of a combination of people, processes, and technology. Each element influences the other and arose over time from the business needs and technology architecture at Microsoft. To best judge whether the best practice recommendations and lessons learned in the final part of this paper apply to you, it is useful to understand the background information that drives the design and execution decisions.

Data-Center Structure

The three main enterprise data centers for Microsoft are in Redmond, Washington; Dublin, Ireland; and Chofu, Japan. Enterprise data centers are placed where the majority of employees are located. In addition, there are 16 geographically dispersed regional data centers and approximately 400 worldwide business locations. To create the most stable infrastructure and to reduce costs, Microsoft IT chose to centralize IT operations in the Redmond data center.

A seven-member team of IT operations personnel is responsible for provisioning servers to these three data centers, and they provision an average of 200 new servers each month.

Three levels of service are available to the owners of managed servers in the data center, as shown in Table 2.

Table 2   Service Offerings for Managed Servers at Microsoft

Level Servers Description
One ~700 Power, cooling, and network taps.
Two ~2,000 Power, cooling, and network taps.

Data backup support.

Reactive support, whereby the server owner calls Help Desk when the server is not operating properly, and then Data Center Operations (Ops) is notified and takes action.

Three ~6,000 Power, cooling, and network taps.

Data backup support.

Full management: proactive support of the server hardware and the operating system through Microsoft Operations Manager (MOM), including monitoring for update compliance, as well as automated server provisioning with up-to-date security updates.

Full management (service level three) is critical for application servers that run the business and the core infrastructure, such as file and print servers; proxy servers; remote access servers; and servers that run Active Directory, DNS, and WINS. Service level two is often chosen for test, support, and product development lab servers. Regardless of the service level chosen, each server owner is responsible for managing and maintaining servers at the application level and higher—for example, user rights.

At Microsoft, the server owners must use approved versions of server software and the latest updates. The server owners must also use hardware that is manufactured by approved vendors according to specifications. This standardization keeps costs down, improves reliability of the platforms, increases service availability, and supports centralized and remote monitoring and management.

Operations Model

When a problem with a fully managed server is identified at Microsoft, the problem is escalated as follows:

  • Tier 1, Help Desk.   Most application server issues are detected at Tier 2. However, if the server owner or an application user identifies the problem, he or she contacts Help Desk. Hardware issues for regional servers are handled locally. Regional helpdesk technicians perform any needed hands-on server operations and provide first-line support to the user community in their native language.
  • Tier 2, Data Center Operations.   Ops uses Microsoft Operations Manager alerts to proactively monitor servers for problems so that many problem reports circumvent Help Desk. However, if the server owner identifies the problem and contacts Help Desk, Help Desk then contacts the Ops or messaging operations team for further action. When Ops is alerted, it handles the initial response, spending a defined amount of time (such as 15 minutes) on the issue by using an internally developed Troubleshooting Guide (TSG). Ops also initiates the trouble ticket in an internally developed ticketing system. This system integrates the ticket tracking application with several knowledge management functions, such as the product group knowledge base, TSGs, and other internal resources. The ticketing system is used to manage the incident life cycle from detection and recording to investigation, diagnosis, and resolution.
    TSGs are created when an issue is common and the resolution is known and easily implemented. If Ops cannot resolve the issue, it escalates according to instructions in each TSG for investigation and root-cause resolution. TSGs are linked to alerts in the custom ticketing application.
  • Tier 3, Infrastructure Support (IS) and Advanced Diagnostics and Debug (ADD) teams.   Depending on the nature of the problem, Ops can contact either IS or ADD. IS provides end-to-end measurement and management of core infrastructure services. ADD specializes in debugging Microsoft Windows® operating system issues and communicates directly with the product development groups.
  • Tier 4, Engineering.   IS contacts Engineering if resolving the problem involves modifying the IT architecture, hardware standards, or software standards. For example, Engineering creates holistic standard platforms that facilitate server provisioning and periodic maintenance updates so that all new and existing servers are patched up to current standards.

Microsoft IT upgraded its desktop patch management infrastructure from Microsoft SMS version 2.0 to SMS 2003. The purpose of this upgrade was twofold: to test the upgrade to SMS 2003 in a production environment before release to the public, and to deploy a comprehensive solution for providing relevant software and updates to employee desktop computers quickly and cost-effectively.

Note
   
For information about how Microsoft IT uses SMS 2003 for desktop patching, see the IT Showcase white paper Desktop Patch Management with Microsoft Systems Management Server 2003.

Microsoft IT examined patching requirements at Microsoft and decided to create separate SMS 2003 infrastructures: one to patch servers and one to patch desktop computers. Microsoft IT based this decision on the following factors:

  • The priorities for patching servers and desktop computers are different.
  • There are more than 10 times as many desktop computers as servers, yet the desktop computers depend on the reliability and accessibility of the servers to fully function.
  • The corporate standard configuration (called the baseline) for servers at Microsoft is much more uniform and stable than for desktop computers. In addition, the rate of change in the server population is smaller than that for the desktop population, though high compared to many enterprises. For example, around 5,000 desktop computers per month are rebuilt or reimaged for development and testing purposes. Newly provisioned or rebuilt servers in the production data centers average 200 per month. Existing servers are baselined every six months to one year with a standard maintenance update process.
  • Early adopter product testing required Microsoft IT to implement both an upgrade to the existing SMS 2.0 infrastructure for desktop patching and a "greenfield" deployment to the new server patching infrastructure. The new deployment for server patching also had to interoperate with existing internal server patching tools, server build procedures, and change management procedures.

The SMS hierarchy for desktop computers consists of two distinct site server roles: primary and secondary. A primary site server has a database running Microsoft SQL Server™ 2000 Service Pack 3a (SP3a), but a secondary site server does not. To ensure peak performance of security update and software update installation, the Microsoft IT server patching hierarchy contains only primary site servers.

SMS sites are also described by their management relationship to other sites in the hierarchy, as follows:

  • Central site.   The central site is the primary site at the top of the hierarchy; it is the one site in a hierarchy that is not a child to any other site. Therefore, the SMS site database at the central site acquires the data of the entire hierarchy. The central SMS site database stores the inventory information for the central site and all of its subsites.
  • Parent site.   A parent site is a primary site that includes at least one other site beneath it in an SMS hierarchy. Only a primary site can have child sites.
  • Child site.   A child site is a site that reports to a site above it in the hierarchy. SMS copies all information collected at a child site to the parent site, which in turn reports all of the accumulated data to its parent site. The Microsoft IT design has no child sites, so none of its primary site servers constitute parent sites.

Figure 1 shows the SMS 2003 server architecture at Microsoft for server patch management.

Bb735249.smsspm01(en-us,TechNet.10).gif

Figure 1   SMS server patching architecture at Microsoft

Inside Microsoft, the SLA requirement for installation of critical security updates for servers is less than 24 hours. The SLA for noncritical updates for servers is less than seven days.

The server patching design used inside Microsoft uses only two server tiers. The design drivers are:

  1. The primary sites correspond with the underlying network infrastructure bandwidth.
  2. Servers must be patched in the shortest possible time.

However, Microsoft IT recommends that organizations examine all network performance implications when designing an SMS infrastructure in a bandwidth-constrained environment.

The servers in the Microsoft IT design are four-CPU, 1.5-gigahertz (GHz) computers with 2 gigabytes (GB) of random access memory (RAM), a 34-GB hard disk drive for SMS, and a 34-GB hard disk drive for the database.

Server Patch Management Process

Patching thousands of servers is a cross-team cooperative effort that involves the following specialized teams and roles at Microsoft:

  • Microsoft Response Center (MSRC).   The MSRC investigates product issues that are reported directly to Microsoft, as well as issues discussed in certain popular security newsgroups. The MSRC releases security bulletins about vulnerabilities.
  • Corporate Security Compliance.   The Corporate Security team (within Microsoft IT) reviews the bulletins posted by the MSRC and assigns a deployment priority to them according to internal security needs. For example, the threat may not be as imminent to Microsoft because of Microsoft's implementation of previous updates or of protective firewalls. The security analyst within Corporate Security recommends the enforcement dates for updates and facilitates the flow of information between the Corporate Security and Data Center Operations organizations.
  • Data Center Operations.   The Ops team manages the data centers and hosts the SMS infrastructures. The SMS patch administrator within Ops creates and prepares the update deployment package and distributes the update according to the recommended target computers and enforcement date. The patch administrator uses the SMS 2003 Distribution Software Update Wizard to ensure that the right updates reach the right computers within the prescribed time. The patch administrator adds the updates to the next scheduled software distribution, authorizes the update, sets the update properties to have the proper enforcement date (according to the severity), and sends the update to the targeted groups of servers for distribution.

Microsoft IT follows two schedules for deploying updates to servers:

  • Emergency updates that have a rating of Critical are deployed to servers in less than 24 hours.
  • Updates of any lower rating, known as standard updates, are deployed monthly.

In both cases, Microsoft IT uses automatic patch enforcement in SMS 2003. If the local server administrator does not install the update within the allotted time, SMS automatically installs the update after the time elapses and forces a restart on the server to effect the changes.

Note
   
There are a few mission-critical servers that Microsoft IT has exempted from forced compliance because the server administrator must have complete control over when the server restarts. In these cases, however, the administrator is still expected to perform any critical security update within 24 hours of deployment. If the administrator has not performed the patch by the deadline, SMS automatically enforces the patch.

Figure 2 shows the overall process flow for server patching on the standard timeline, the teams that perform each step, and the timeline. The same teams follow the same steps for a critical update, but on a compressed timeline.

Bb735249.smsspm02(en-us,TechNet.10).gif

Figure 2   Server patching process flow at Microsoft

Note
   
There is no change management decision point after analysis and before testing. For server security updates, the decision to deploy is preapproved and the process focus decision is how fast to deploy the updates.

Phase 1: Monitoring for Security Bulletins and Updates from Microsoft

The release of a security bulletin from the MSRC kicks off the process. MSRC bulletins include details that describe vulnerabilities and the products that they affect. The bulletins also include detailed technical information describing vulnerabilities, updates, and workarounds, in addition to deployment considerations and download instructions for any available updates.

Security bulletins and related security updates are released monthly between 10:00 A.M. and 11:00 A.M. Pacific Time, unless Microsoft determines that customers will be better served by releasing a security bulletin at a different time. This policy was established in response to international customer feedback, with the purpose of better enabling customers to plan and schedule patch management activities. Although the Microsoft IT team is involved in the development of the update before the bulletin is released, the process of deploying the update to all servers begins after the update is released, just like other Microsoft customers.

All security bulletins and other information about Microsoft product security are available at http://www.microsoft.com/technet/security/default.mspx. All security updates included in the last two service packs for all currently supported products are available for download from this location.

The Microsoft Response Center offers update notification through e-mail subscription services. For customers who have more extensive knowledge of or interest in the technology behind security updates, Microsoft TechNet offers the Microsoft Security Notification Service, a free e-mail notification service. These e-mail messages are geared toward IT professionals and contain in-depth technical information. For more information or to sign up to receive the Microsoft Security Notification Service, see http://www.microsoft.com/technet/security/bulletin/notify.mspx. This is the method the Corporate Security Team uses to stay apprised of update notifications.

Phase 2: Determining the Risk Level

The MSRC rates the urgency of updates according to the severity ratings shown in Table 3.

Table 3   MSRC Update Rating

Rating Definition
Critical A vulnerability whose exploitation could allow the propagation of an Internet worm without user action.
Important A vulnerability whose exploitation could result in compromise of the confidentiality, integrity, or availability of users' data, or of the integrity or availability of processing resources.
Moderate Exploitability can be mitigated to a significant degree by factors such as default configuration, auditing, or difficulty of exploitation.
Low A vulnerability whose exploitation is extremely difficult or whose impact is minimal.

The Corporate Security team adapts the MSRC Maximum Severity Rating System to determine the internal deployment schedule. Emergency critical updates are sent out on a "zero day" schedule that seeks to update all servers within 24 hours. Updates rated Important, Moderate, or Low are all added to the standard monthly automatic update distribution.

Note
   
More information about the Maximum Severity Rating System is available at http://www.microsoft.com/technet/security/default.mspx

Automated Scanning for and Reporting of Vulnerable Computers

After the deployment timeline is set, Microsoft IT must determine which servers are or might be vulnerable.

The Security Update Inventory Tool provided in the SMS Software Update Services Feature Pack extends the SMS hardware inventory to report on the security updates that are missing on a set of SMS clients (servers). Clients compare what has been installed against the list of available updates contained in the Extensible Markup Language (XML) file downloaded from the Microsoft Security website.

The Security Update Inventory Tool also includes the Microsoft Baseline Security Analyzer (MBSA), version 1.2 as of this writing, as the update scanning engine. Microsoft IT uses MBSA to scan for missing and installed updates on local and remote computers. MBSA can also scan all the computers in a given domain for missing and installed updates. Microsoft IT uses the results of these scans to identify and discover missing updates.

For updates that are not available or are not supported by MBSA, Microsoft IT uses the standard software distribution feature in SMS to deploy the update packages to servers.

Phase 3: Testing

No matter how much testing is performed, rolling out an update into production sometimes produces effects that can never be replicated in a lab or test environment. To avoid negative impacts on a large number of servers, Microsoft IT can create a reference collection within SMS 2003 that contains a representative sample of all permutations of the production servers.

There are two primary forms of testing for successful update deployment:

  • Test the production environment's implementation of update distribution in terms of connectivity and reporting
  • Test the update itself for installation on computers representative of the production environment

To test the SMS 2003 environment, the inventory, the collections, and connectivity, Microsoft IT deploys a synthetic patch to one or more collections within the production environment. A synthetic patch is an inert payload. When an administrator deploys a distribution package that contains only a synthetic patch, it generates all the SMS 2003 reports and status messages that indicate how successfully the patch reached all targeted computers. Yet, the synthetic patch has no impact on the production environment; it does not alter the target computers or force any restarts.

Note 
  
SMS 2003 Web Repository Toolkit 1 includes a synthetic patch tool and is available as a free download at http://go.microsoft.com/fwlink/?LinkId=22859

Phases 4-7: Deploying the Patch

Standard Critical Update Deployment

For deployment of standard updates, 24 work periods or maintenance periods of four hours each are scheduled over the course of four days. Data-center server owners map each of their servers to a specific maintenance period within the overall work schedule in an internal change management database called IT Configuration. The SMS 2003 patch administrator then uses these groupings of servers to target delivery of the update, so that any restarts required are within the approved maintenance time period. Note that Microsoft IT recommends that clustered server owners place each server in the cluster into different maintenance periods, so that the entire cluster is not patched at the same time.

Figure 3 shows the work-period breakdown for an example four-day patching timeline.

Bb735249.smsspm03(en-us,TechNet.10).gif

Figure 3   Twenty-four work periods for standard deployment

Emergency Critical Update Deployment

For emergency critical updates, there are only four work periods of one hour each. Each emergency work period maps to an entire day in the standard timetable. After the four-hour timetable, Microsoft IT allots an additional three hours to check the success rate of the installation and ensure that the remaining servers are patched within seven hours.

Figure 4 shows how the four-hour critical update deployment periods overlay the standard work-period breakdown for the example four-day patching timeline.

Bb735249.smsspm04(en-us,TechNet.10).gif

Figure 4   Four work periods for emergency deployment

For example, if the server administrator has mapped the server into the IT Configuration database with a maintenance period of Saturday at 4:00 A.M. for the standard deployment timeline, the server is patched on the second Saturday of the month beginning at 4:00 A.M. However, the same server will be patched in the Hour 3 maintenance period on an emergency deployment timeline.

Phase 8: Reporting

There are a number of tools that a patch administrator can use to check the status messages returned from SMS clients after deploying a package by advertising. The patch administrator can use the Advertisement Status Viewer to ascertain:

  • The number of clients in the collection that have not received the advertisement.
  • The number of clients that have received the advertisement but have not run it.
  • The number of clients that have run the program unsuccessfully.

To check that the Update Install Program is running successfully on clients, the patch administrator analyzes the status messages to determine when the program was last successfully run on each client. The patch administrator investigates any delays, which can occur if, for example, an SMS client is not turned on or is not functioning correctly.

Status messages also record the degree of voluntary versus enforced patching, and how the server administrators are managing restarts and scheduled installation. Based on this data, the patch administrator adjusts enforcement and default settings for the next round of patches to bring computers into compliance more efficiently.

For follow-up, the patch administrator uses the Compliance by Software ID report in SMS to obtain a summary of the total number of systems for which an update is installed and missing, as well as the status relating to update distribution. This report helps identify the current compliance levels for a particular update across the production environment.

Figure 5 shows the cost and server impact of the timetable for update deployments from notification through final enforcement and follow-up. The figure also shows the escalating potential impact to server operations.

smsspm05

Figure 5   Cost/impact over time

Lessons Learned and Best Practices

A number of lessons learned and best practices arose from Microsoft IT's experience with the implementation of an SMS 2003-based server patch management, as described in the following sections.

Establish a Change Advisory Board

Microsoft IT recommends the formation of a change advisory board (CAB) composed of representatives from areas of the business that would be affected by the security threat or the installation of a software update. CAB members should include individuals who have experience in the specific technologies and services that will be used to deploy the update, in addition to representatives from the business, network, security, service desk, and technical support teams.

The CAB should form an emergency committee whose task would be to quickly authorize critical updates—those designed to close security vulnerabilities or avoid critical system failures. The emergency committee should be composed of people with the right background and operational authority to approve emergency changes, and who are available to make quick decisions.

To Control Planned Downtime, Use a Change Control Database

The Microsoft IT server administrators use an internally developed change control database (IT Configuration) to designate a specific maintenance period in the timeline for their servers. This system ensures that any downtime from a necessary restart is minimized, and it enables the local server administrator to plan ahead to work around any issues. The time period selected differs by server, geography, and business needs.

Targeting updates by using SMS for distribution to servers according to the designated maintenance periods minimizes service disruption. Administrators of clustered servers should place each cluster node in separate maintenance periods in the database, to avoid negative impacts on the entire cluster.

Streamline the SMS 2003 Installation

To make SMS 2003 easier to administer and run faster, Microsoft IT recommends that the SMS installation have only the features enabled that the administration team will use. This kind of installation not only makes the local SMS administration simpler to run and faster, it has a significant impact on the bandwidth requirements throughout the production environment, including network traffic volume, memory, and storage requirements for site servers, distribution points, and management points.

Aggressively Monitor and Manage SMS Client

Aggressively monitor and manage your clients. A server without a healthy SMS advanced client cannot be patched by SMS. Even the servers that are marked as exceptions to the regular patch management process at Microsoft have the SMS client installed and report status. However, at any given moment in time, there will be some number of clients that are not reporting status. You should investigate and resolve these issues. For example, at Microsoft the most frequently occurring reason for a missing client status report is that the server was momentarily unavailable to a network PING request at the time the report was run.

Suspend Monitoring During Patching

Microsoft IT uses MOM to monitor servers. To suppress thousands of unnecessary event alerts, MOM monitoring is turned off immediately prior to patching, and re-enabled after patching.

Make Status Self-Serve

To enable local server administrators to check the patching status of their servers, Microsoft IT built an internal website and tool called Serverpmstatus. After the server administrator provides valid authentication credentials, the website queries the database used for change and configuration management and returns a list of servers and status for which the administrator has Owner, Authorizer, or Notify permissions. If the status shows that the server is vulnerable, the administrator can manually patch the server or wait for the automated patch to be applied during the work period defined in the database.

Status on the Serverpmstatus website is updated every four hours. Figure 6 shows an example screen shot of the tool.

Bb735249.smsspm06(en-us,TechNet.10).gif

Figure 6   Example patching status report

This example shows that the internal Corporate Security scans and SMS vulnerability scans are staggered for maximum effectiveness. The listing for the MS03-051 patch appears as missing in the SMS scan because it is not supported by MBSA 1.0. Security updates that are not supported by MBSA 1.0 are still patched during the defined maintenance period.

Communicate the Rollout Schedule to the Organization

The patch administrator should send a clear and easily identifiable e-mail message to server administrators, informing them about the update and providing information about how to install it. This mail should be flagged for follow-up to remind administrators of the actions they need to take.

Assign Software Distribution Points

After the package has been imported into SMS, Microsoft IT decides which distribution points should be used to make the update available.

In general, updates will be deployed to the same groups of servers each time, and so the same distribution points will be used for each update. For example, updates for servers should go to all distribution points in the data center's site. The SMS patch administrator can then set up distribution point groups that contain only distribution points for particular ranges of servers. Use of distribution point groups expedites the process of assigning distribution points to updates being deployed.

The SMS patch administrator should use the inventory information within the SMS database to identify where new distribution points are needed. Note, however, that simply adding a distribution point to a distribution point group does not cause the package to be sent to the new distribution point, even if the Update Distribution Points option is used. New distribution points should be added one by one to the package, and then the distribution points can be updated.

Stage Updates on Distribution Points

After the appropriate distribution points have been assigned, administrators should ensure that copies of all the individual files are distributed to these servers. Use the SMS status system to monitor the progress of distribution of the update files.

Monitor Bandwidth When Sending Updates Between SMS Sites

For normal update distribution, to avoid overloading your network, limit either the amount of network bandwidth used or the times of day that transmission can occur when sending instructions, software packages, and advertisements between sites.

SMS enables an enterprise to define package priority as High, Medium, or Low. Microsoft IT reserves the High priority for critical updates only.

For emergency updates, Microsoft IT lifts all intersite restrictions and allows updates to be sent to other sites as quickly as possible. If network links between sites are slow or are already congested, lifting restrictions on the intersite sender has no effect. In these cases, Microsoft IT considers sending the update to each site by using the SMS courier sender.

Select Deployment Groups

When administrators use the Distribution Software Update Wizard to distribute a new update, they do not have to target computers precisely. The wizard deploys a smart agent to the client, which is invoked when a new update is to be installed. This agent automatically handles whether an update advertised through the wizard is applicable to that computer and whether it has already been installed. It also handles chaining multiple updates and the restarts needed to make the update current.

If administrators do not use the Distribution Software Update Wizard, but the updates are being distributed through a custom package and collection, they create a distribution list by creating one or more SMS queries.

Advertise the Update to Client Computers

When administrators use the Distribution Software Update Wizard, a repeating advertisement is automatically created to run the update installation agent on computers in the target collection. The repeat interval can be altered from the default of seven days, as appropriate for the collection. If different schedules are needed for different types of computers, multiple advertisements can be created for multiple collections, through the same package and program. If the repeat interval for the running of the update installation agent is set to daily and it needs to be run sooner for the rollout of a critical update, a new, one-time, mandatory assignment should be made for the advertisement to run as soon as possible.

Sometimes, Microsoft IT must distribute updates through a custom package and collection, such as the January 2004 Microsoft Data Access Components (MDAC) Security Update 832483.

Test the Impact of the Update

No matter how much testing is performed, rolling out an update into production often produces effects that can never be replicated in a lab or test environment. To avoid negative impacts on a large number of servers, create a reference collection within SMS 2003 that contains a representative sample of all permutations of the organization's servers. This is an efficient way to test whether the update will be successfully installed on all platforms in the organization.

Initially, test the update deployment's basic functionality. Then, gradually add levels of complexity at each successive stage. Document the results at the completion of each test phase and verify the findings against the project requirements. Investigate and resolve any problems before moving forward.

Model the test lab on the organization's production environment. If the organization uses standard client and server hardware configurations, use these configurations in the lab. As far as possible, use the same hardware, software, network, logon scripts, and other technologies used in the production environment. If the production environment includes computers with nearly full disks, obsolete and possibly unused software, or an assortment of different network adapter cards, install some lab computers with the same characteristics. If routers or slow links connect production networks, duplicate these conditions in the lab. Some organizations use server backups restored to unused or outdated server hardware for this purpose.

Deploy updates in timed phases to avoid stressing the entire network's bandwidth. In general, deploy updates by time zone to match the off-peak usage of the network so that more bandwidth is available for distributing the updates.

If a server has an absolutely essential function and peak demand is not regular or predictable, consider exempting that server from a forced update. There are certain servers at Microsoft that have such an exemption. The administrators are still obligated to update within the deployment period, but the SMS distribution does not force an update on those servers at the end of the period.

For most cases, however, the following practices particularly minimize impacts to servers:

  • Use the persistent icon on all deployments.
  • Base the period of forced update installation on the servers' periods of low demand.
  • Use an automatic, periodic distribution that bundles several updates together.

Most servers have peak and off-peak periods of activity. When setting up times for forced patch installation, set the forced update time to coincide with the server's off-peak hours. Information about off-peak hours should be available in the change management database.

Establish Enforcement Policy

Microsoft IT's policy is to require the deployment of emergency critical security updates within 24 hours and standard critical security updates monthly. If the local server administrator does not comply within that time period, the SMS distribution program automatically updates the server and restarts it to effect the changes. This process provides administrators reasonable time to coordinate the restart with business needs, but also ensures that the update is installed. On average, around 2% of all managed server's owners patch themselves before the deadline.

If a server is not brought into compliance within the compliance window for a security updates, a Microsoft IT administrator disables that server's network port. Although this action shuts down all throughput related to that server, it is preferable to that server propagating a virus or worm to its clients and beyond. The server administrator then contacts helpdesk to start the process that ensures the update is installed, the server is restarted, and the port is re-enabled as soon as possible.

Plan Disaster Recovery

The Microsoft IT SMS implementation consists of dedicated stand-alone SMS infrastructure in addition to the SMS service running on key infrastructure platforms. Disaster recovery steps need to account for this implementation, because other services on an infrastructure platform may need to be reinstalled and reconfigured prior to SMS 2003 Advanced Client installation. Microsoft IT handles disaster recovery by using the automated server build process and tools, but customers without an automated process should ensure that software is restored in the proper order. For example, if some of your secondary sites reside on domain controllers in regional tail sites, the domain controller should be restored first, and then the SMS site should be reinstalled. The disaster recovery steps for dependent services should be documented in the business continuance plans.

Implement the SMS 2003 Advanced Client Throughout the Enterprise

The Advanced Client supports several features in SMS 2003 that provide advantages in patch management, including:

  • Ability to implement Advanced Security.
  • Compatibility with Active Directory, which makes inventorying and path management quicker and more reliable. Implementing Advanced Security is a prerequisite for using Active Directory with schema extensions in SMS 2003.
  • More automated update deployment. Legacy clients require manual installation or script installation and are limited in their ability to find paths to source software required for some update installations.
  • Improved ability to generate reports and status messages (compared with legacy clients), making the administrator's tasks of gathering patch status and metrics easier.
  • Legacy clients do not support the MBSA version included with SMS 2003, which enables administrators to ascertain software revision levels on all Advanced Client servers and desktop computers throughout the network.

Create the Appropriate Positions and Teams

Microsoft IT recommends that the IT department of a large enterprise should have an administrator dedicated solely to patch management. Patch management requires:

  • Complete knowledge of the update information available.
  • Processes to assess, configure, test, and deploy updates.
  • The ability to interact with security departments and coordinators.
  • Thorough knowledge of the IT infrastructure.
  • Mastery of SMS 2003 tools.
  • Knowledge and time to test update deployments.
  • Examination of reports and status messages to ascertain success and troubleshoot failure.

In larger organizations, another person or committee should evaluate and prioritize updates regarding urgency and applicability to the enterprise's IT infrastructure.

The patch management team in an enterprise should possess the following minimum levels of certification and skills:

  • Microsoft Certified Professional (MCP) or Microsoft Certified Systems Engineer (MCSE) in SMS, Microsoft Windows 2000 Server, and Windows 2000 Active Directory.
  • IT Infrastructure Library (ITIL) foundation certificate, ideally holding an ITIL master's certificate.
  • Familiarity with the key issues and technology around patch management and software distribution.
  • Training in MOF concepts and principles.
  • Experience in managing and delivering complex process-based and technology-based projects.

The enterprise may also need to formulate its own exercises to ensure that patch administrators know how to respond to an update notification, create inventory collections, configure update deployments, create a reference collection, deploy a synthetic update and evaluate the results, and set up automated functions such as automatic deployment schedules and enforced installation. Depending on the solution architecture and configuration, patch administrators may need to know how to write installation scripts and manually interact with update installation processes.

In addition, generalized operations training may be accomplished through courseware developed by Microsoft and delivered through a variety of vendors. Applicable courses include Microsoft Operations Framework Essentials, Microsoft Operations Framework Changing Quadrant, and Managing a Microsoft Windows 2000 Network and Environment.

The structure of the patch management team should generally match the structure described in the following sections.

Design Team

The design team creates a design document that outlines how to perform the activities to be conducted during each phase of patch management. For example, the design document should describe how the organization registers for, receives, and monitors information about new updates. Patch management designs will vary according to the size and complexity of the organization.

Project Team

The project team should review operations tasks in the design document to determine which ones are required for the patch management solution being created. The project team then interacts with the IT organization customer to assign responsibilities to groups or to individuals to facilitate those tasks.

Test Team

The size and number of tasks assigned to the test team depends on a number of factors, including:

  • The size of the IT infrastructure.
  • The diversity of programs installed.
  • The diversity of connection types and speeds.
  • The rigidity or variability of the software version baseline.
  • The business priorities.

Someone with proven, in-depth technical skills should lead the team. Microsoft IT recommends that the team include personnel who are responsible for supporting the patch management solution after it is deployed. The team as a whole should have a good understanding of the business, the business objectives, and the reasons behind the deployment.

Patch Project Manager

The update project manager investigates the update and reads the relevant Knowledge Base article to understand what the update fixes. The update project manager must consider whether an update is specific only for particular scenarios or configurations. It might be necessary to build SMS queries and Web reports to obtain this information. For example, if the Knowledge Base article states that the update is required only for computers with a certain processor or minimum amount of memory, the reviewer may need to create a query or Web report to determine whether the update is applicable to any computers in the enterprise.

Alternatively, if the update in question is a critical security update, the update project manager may need to forward the update information to be applied in a preventative manner to forestall any future vulnerability.

Patch Administrator

The patch administrator should:

  • Develop a plan for making the required changes.
  • Determine and obtain the resources required.
  • Arrange for the development of any necessary scripts, tools, and documentation that will be necessary to deploy the changes.
  • Ensure that adequate testing is carried out.
  • Ensure that the changes are deployed into production.
  • Assess the success or failure of deployment.

A designated patch administrator ensures that necessary collecting, configuring, testing, patching, and reporting steps are all followed so that no systems are left unprotected within the production environment. SMS 2003 includes several predefined patch-related reports for discovering missing updates. They display such information as applicable security updates and installation status for a specified update. For example, the "Count of Applicable Software Updates by Type" report identifies updates that are relevant to computers in the production environment but that have not yet been installed.

Identify Computers That Were off the Network

Enforcing security policy can be complicated by distributed administration and vulnerable assets that are not centrally managed. Those responsible for administering the asset and resolving the vulnerability may be unknown or hard to find, may physically reside within another department in the organization, or may not have the necessary skills to resolve the vulnerability on their own.

Note
   
For more information about how the Microsoft IT group manages vulnerabilities, see Managing Computer Vulnerabilities at Microsoft.

Baseline the Environment

A baseline is a set of documented standard configurations of a product or system that is established at a specific point in time. Baselines establish a standard that systems of the same class and category are required to match. Effective IT operations use baselines as a trusted point from which systems are built and deployed. Microsoft IT updates server baselines twice a year. The configuration defined by a baseline should be stringently tested and hardware vendor certified.

Baselining requires an accurate inventory of the computers and services within the production environment, including all software that is required by different types of systems or server roles, such as printer server, domain controller, or messaging server. The baselines include operating system version and application versions plus all required software updates. A number of baselines might be required, depending on the variation of hardware, software, and business organization.

Baselines for servers should be as simple as possible, and rigidly enforced. At Microsoft, the minimum server version level for most servers is Windows Server 2003. Some servers are already running prerelease versions of the next major release of the Windows Server operating system. This is a much narrower definition than is allowed for desktop computers. This minimum level ensures that all servers throughout the organization have also installed SMS 2003 Advanced Client.

Servers that fall below a release or update baseline should be addressed through problem management and update planning and deployment. After an initial baseline inventory scan, bring the sub-baseline hosts up to baseline compliance. During subsequent cycles of inventory, if the host repeatedly falls below the baseline, forward information about that incident to the problem management organization for further investigation. This symptom may indicate a system that has issues regarding distribution, schedules, or permissions, or that may require special care through exception handling.

At Microsoft, servers that exceed the upper limit of the baseline range are not automatically exempt form patch management. Systems that exceed the approved baseline contain application versions or updates that have not been interoperability tested and formally approved by Data Center Operations and Corporate Security. This fact is something to consider when a server is running beta or prerelease software, which is a constant factor at Microsoft.

Check computers that exceed their class baseline to determine whether unauthorized changes have occurred. In some cases, an above-baseline system may need to be returned to a trusted level or exempted from update deployments until the corporate baseline has risen to match it.

Consolidate Updates into Service Packs

After a number of updates accumulate for a software version, Microsoft rolls these fixes into a service pack. The service pack contains all the software updates and security updates that have been issued up to the release of the service pack.

The patch administrator should become familiar with the correlation between a service pack and the updates that it contains. This familiarity can streamline the process of installing the proper baseline on a computer newly brought online. This way, a new computer might be installed with the original software plus one or two service packs, instead of retrieving and installing a large number of updates to bring the platform up to the corporate baseline.

Continually Improve the Process

Staffing is as important as the features of SMS 2003, the implementation, the server hierarchy, and how the patch management process is defined. To arrive at the best process, implement a process change structure and follow it to make and track changes to the standard process until it reaches a satisfactory level of performance and the desired percentage of automated update installation and compliance.

For example, because Microsoft releases updates on a monthly schedule, Microsoft IT must plan ahead and resource appropriately. Figure 7 shows an example of the work breakdown structure for regular monthly patch management activities so that proper staffing levels can be arranged in advance.

Bb735249.smsspm07(en-us,TechNet.10).gif

Figure 7   Sample monthly patch management work breakdown structure

In addition, Microsoft IT operates on the principle that "you cannot improve what you do not measure." To improve your process over time, gather performance statistics. Some suggested metrics to collect are shown in Table 4.

Table 4   Sample Patch Management Metrics

Measurement Example Trend Actions to take
Patching activity five per month N/A Baseline for comparison.
Ratio of rejected patch Requests for Comments (RFCs) one out of six smsspm08 Document RFC completion requirements.

Educate staff on RFC completion requirements.

Enforce RFC completion through Change Log tool.

Ratio of emergency patches one out of four smsspm08 Implement mitigation strategies and tactics to reduce attack surface.
Patch success ratio (per patch) 97% smsspm08 Systematically document and incorporate failure modes into testing scheme.
Number of support incidents (per patch) Nine smsspm08 Produce reusable workarounds.

Bring rogue systems into baseline compliance (upgrade, service pack, etc).

Provide self-help on website.

Push self-help to users in e-mail, voice mail, or other notification mechanism.

Better prepare and educate helpdesk.

Cost of downtime, productivity loss, or lost business transactions per update $25,000 smsspm08 Process improvements that lower this cost improves profitability.

Use this number to guide patching timelines.

Time from test success to 60% saturation deployment 1: 75 hours

2: 12 days

3: 30 days

smsspm08 Circumvent network bandwidth and bottleneck issues.

Resolve policy and compliance issues.

Resolve notification failures or miscommunications.

Note
   
Maintenance period changes for renegotiation.
Note
   
Workload and cycles for capacity planning purposes.
Identify time from 60% to 80% saturation deployment 1: 25 hours

2: 10 days

3: 30 days

smsspm08  
Identify time from 80% to 90% saturation deployment 1: 10 days

2: 10 days

3: 30 days

smsspm08  

For More Information

For additional information about how to deploy, operate, maintain, and support SMS, visit http://www.microsoft.com/smserver/.

For details about Microsoft Solutions for Management (MSM) and the MOF, visit http://www.microsoft.com/technet/itsolutions/cits/mo/default.mspx

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:

http://www.microsoft.com/

http://www.microsoft.com/technet/itsolutions/msit/default.mspx

For any questions, comments, or suggestions on this document, or to obtain additional information about Microsoft IT Showcase, please send e-mail to: showcase@microsoft.com

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Microsoft grants you the right to reproduce this White Paper, in whole or in part, specifically and solely for the purpose of personal education.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2004 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, Windows, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker