Server Security Patch Management at Microsoft
Technical White Paper
Published: February 4, 2004
Sharing the Microsoft IT Experiences
|
Situation
|
Solution
|
Benefits
|
Products & Technologies
|
|
Organizations that cannot determine and maintain a known level of patch management
for operating systems and application software might have a number of security vulnerabilities
that, if exploited, could lead to loss of revenue and intellectual property.
|
Minimizing the threat of vulnerabilities requires organizations to have properly
configured systems, to use the latest software, and to install the recommended software
updates. Assessing and maintaining the integrity of software in a networked environment
through a well-defined patch management program is a key first step toward successful
information security. The IT group inside Microsoft uses Microsoft Systems Management
Server (SMS) 2003 as the primary tool in its server patch management process.
|
- Automated deployment of security updates and applications
- Central reporting and administration
- More accurate and efficient update management
- Reduction in manual effort to patch servers
|
- Microsoft Windows Server 2003
- Microsoft SQL Server 2000 SP3a
- Systems Management Server 2003
- Microsoft Baseline Security Analyzer (MBSA) version 1.2
- Change management database (IT Configuration DB)
|
Executive Summary
Assessing and maintaining the integrity of software in a networked environment through
a well-defined patch management program is a key first step toward successful information
security.
Patch management is a process that gives organizations control over the deployment
and maintenance of interim software releases into their production environments.
It helps organizations maintain operational efficiency and effectiveness, and it
helps improve the security and stability of the production environment.
The information technology group within Microsoft (known as Microsoft IT) uses Microsoft®
Systems Management Server (SMS) 2003 to:
- Manage the application deployment process.
- Improve asset management for both hardware and software.
- Better manage mobile clients within the Microsoft network.
- Help manage the deployment of security updates across the enterprise.
This white paper focuses on Microsoft IT's early adoption of SMS 2003 to assist
with server patch management, including the deployment of security updates, in its
production environment. The technical elements covered in this paper include:
- The organization and structure of the data-center network and the levels of support
to manage data-center servers.
- The server patching architecture at Microsoft.
- The administrative structure required to support the process of managing security
updates for servers.
- The key phases of the patch management process: monitoring for security bulletins
and updates, determining the risk level, testing an update, deploying an update,
and checking reports on the success of the deployment.
- A comparison of the different timetables for standard and emergency updates.
- The lessons learned and best practices for security patch management.
Customers frequently ask Microsoft IT about the methods employed and lessons learned
when Microsoft software is deployed internally. This white paper provides an inside
look at how Microsoft IT manages security updates at the server level with SMS 2003,
including a discussion of the procedural and staffing decisions necessary to ensure
efficient server patch management.
For information about how Microsoft IT manages both security updates and application
updates for client (desktop) computers, see the IT Showcase white paper
Desktop Patch Management with Microsoft Systems Management Server 2003.
This paper is written for enterprise technical decision makers who want to take
advantage of the patch management features that SMS 2003 provides. This paper is
based on Microsoft IT operational experience as an early adopter. It is not intended
to serve as a procedural guide. Each enterprise environment is composed of unique
circumstances; therefore, each organization should adapt the plans and lessons learned
described in this paper to meet its specific needs.
Readers of this paper should be familiar with the Microsoft Operations Framework
(MOF) Process, Team, and Risk models. For more information about these models, see
http://www.microsoft.com/business/reducecosts/efficiency/manageability/default.mspx
or http://www.microsoft.com/technet/itsolutions/cits/mo/mof/default.mspx.
Note
This paper uses the term patch management
to label the process of fixing software vulnerabilities on computers in a general
sense. The term update refers to the object of the process—the package
that is deployed in patch management.
Microsoft IT's patch management framework follows Microsoft Solutions for Management
(MSM) recommendations. The framework consists of the following steps:
- Assess the environment to be patched on an ongoing basis by understanding the baseline
of system security, constantly reviewing the patch management architecture, managing
SMS client breadth and health, and conducting inventories.
- Identify new patches, determine their relevance to the environment, and verify patch
authenticity by installing each one on an isolated system, and determine enforcement
time frame.
- Evaluate and plan patch deployment to ensure that patch testing, risk assessment,
and patch release processes are all in place.
- Deploy the patch, which includes distributing and installing the patch, monitoring
and reporting on progress and success or failures of the patch, dealing with exceptions,
and conducting a review of the deployment to continue to evolve the patch management
process.
Server patch management falls under three discrete MOF processes, shown in Table
1.
Table 1 MOF Processes
|
Process |
Process objective |
|
Change management |
Avoidance of change-induced service disruptions |
|
Release management |
Coordination and installation of correct, authorized, and tested software and hardware
|
|
Configuration management |
Storage and retrieval of up-to-date configuration information, in support of activities
for other processes |
Introduction
The Microsoft IT group is responsible for managing IT services and a challenging
computing environment for more than 55,000 employees and more than 300,000 devices
that span over 400 sites worldwide. Over 300 of the sites are sales and marketing
offices distributed in major worldwide cities. IT-managed infrastructure exists
at over 200 of those sites.
Because Microsoft is a large enterprise that develops and markets software, the
Microsoft IT infrastructure is much larger than is typical of other corporations
with a similar number of employees, contractors, and vendors. For example, Microsoft
has two to three times more computers and other devices (such as Smartphones and
Pocket PC devices) than personnel. Microsoft IT manages more than 120,000 desktop
computers and portable computers spread among the production, product development,
test, and support organizations.
Microsoft IT consists of more than 3,500 staff members that are responsible for
managing the IT utility for the company. In addition, Microsoft IT plays a key role
in helping the company meet its main business objective of software development
and marketing. Microsoft IT serves as an early adopter of new Microsoft software,
such as Microsoft Windows Server™ 2003, Microsoft Office 2003,
and Systems Management Server 2003. This process is known internally as "eating
our own dog food."
The early deployment of technology and continual growth at Microsoft result in a
highly dynamic environment. The environment houses more than 6,000 servers that
provide essential services. These services include 1,600 line-of-business applications
that range from a single SAP R/3 instance to specialized departmental or even workgroup
applications for groups such as research, product support, and product development
in four different Active Directory® directory service forests.
Note
This paper is based on the infrastructure
of the primary production forest and excludes references to the extranet, research,
product development, testing, and support infrastructures, except where noted.
Servers in the primary production data center provide many mission-critical functions
with service level agreements (SLAs) for uptime greater than 99.9 percent. Minimizing
unplanned server downtime is a key operational and server patch management requirement.
Strictly managing the timing for planned downtime is also a key requirement, especially
for the many clustered servers. In addition, Microsoft IT manages to a goal of around
200 servers per administrator and budgets no additional headcount for the rising
trend in the number of server updates.
Additional challenges in the Microsoft security environment include:
- As many as 2,500 unique attacks, probes, and scans occur on a daily basis.
- Each month, Microsoft probes, scans, and quarantines over 125,000 virus-infected
e-mail messages.
- Unique IT environments for product development, testing, support, and research require
special security.
- Most Microsoft employees are highly technology literate and routinely explore the
limits of the tools available to them in order to improve product quality. For example,
more than 95 percent of Microsoft employees have local administrator rights to their
desktops. Some employees even run server operating systems on their desktop computers
for various development, testing, and product support purposes. For security patch
management purposes, these computers are managed the same way as client (desktop)
computers.
This combination of factors—an evolving security landscape full of potential
vulnerabilities operating across a large, dynamic, and demanding IT environment—presents
a challenging array of variables for the server management IT function to manage.
In addition, making sure that an update—especially a security update—reaches
only its intended targets is absolutely essential so that conflicts do not arise
between the update and other software versions for which it was not intended. Microsoft
IT requires that patch installation must be able to fix the problem without creating
side effects or negative interactions.
To help address these issues, Microsoft IT turned to SMS 2003 to help manage the
computing environment at Microsoft. SMS 2003 provides Microsoft IT with:
- Inventory functions to determine how many computers have been deployed, their locations,
their roles, and the software applications and updates that have been installed.
- Scheduling functions that allow scheduled deployment for updates outside regular
working hours, or at a time that has the least impact on business operations.
- The Distribution Software Update Wizard, which enables administrators to rapidly
select and deploy software distributions, such as security updates, to specific
groups of computers, such as servers.
- Status reporting that enables patch administrators to monitor the progress and assess
the success of installation.
The server patching process prior to SMS 2003 was labor intensive. One or more Microsoft
IT professionals had to write a script to distribute the update. The script had
to reach only the computers for which it was intended, determine whether the update
had already been installed on each target computer, and then install the update
as needed. A new script had to be written for each update distribution. After the
update deployment script was written, it had to be tested for integrity, and then
tested again on a select representative server distribution to ensure that it would
work. Before SMS 2003, Microsoft IT had no single mechanism to assess and gather
inventories of various servers, what operating system versions and service packs
they were running, and whether they had been patched.
Business Benefits of Server Patch Management with SMS 2003
Automated Security Update and Application Deployment
With the ability of SMS 2003 to enforce security update installation, Microsoft
IT can ensure that servers are updated within prescribed timeframes. By creating
server collections coordinated with the server's time zones and optimum times for
automatic reboots, SMS 2003 enables the Microsoft IT patch administrator to automatically
update the servers in the data center with minimal impact to the business.
Central Reporting and Administration
SMS 2003 centralizes the reporting and administration of complex networks of computers
by consolidating information about network assets into a single database, including
servers, their locations, and the software running on them. This provided Microsoft
IT a unified and up-to-date repository of the inventory of Microsoft's IT infrastructure,
combined with a centralized, unified set of tools for monitoring, administering,
and updating any combination of servers in the network. This centralization also
establishes a clear communication path that helps software deployment and security
updates reach the intended servers. It also helps ensure that the servers communicate
the results of the deployment attempt back to the SMS server.
More Accurate and Efficient Patch Management
To meet the challenge of an increasing number of updates, SMS 2003 offers a number
of features to enable deploying more updates by fewer administrators in less time.
The inventorying features enable an administrator to categorize computers into any
number of groups based, for example, on what server tasks they perform, what OS
and service packs they run, their physical location, and which time zone they share.
Reduction in Manual Effort to Deploy Updates
SMS 2003 offers many automated tools to group targeted computers easily and to deploy
software updates. In addition, integration with Active Directory allows IT administrators
to re-use information in that repository for targeting users and computers. With
SMS 2003 Microsoft IT spent less time writing scripts, testing scripts, and testing
deployments. For example, before adopting SMS 2003 Microsoft IT needed three to
five IT professionals per shift, 24 hours per day for update management. As of this
writing, using SMS 2003, one person per shift is doing the same job.
Background
The server patch management process at Microsoft is composed of a combination of
people, processes, and technology. Each element influences the other and arose over
time from the business needs and technology architecture at Microsoft. To best judge
whether the best practice recommendations and lessons learned in the final part
of this paper apply to you, it is useful to understand the background information
that drives the design and execution decisions.
Data-Center Structure
The three main enterprise data centers for Microsoft are in Redmond, Washington;
Dublin, Ireland; and Chofu, Japan. Enterprise data centers are placed where the
majority of employees are located. In addition, there are 16 geographically dispersed
regional data centers and approximately 400 worldwide business locations. To create
the most stable infrastructure and to reduce costs, Microsoft IT chose to centralize
IT operations in the Redmond data center.
A seven-member team of IT operations personnel is responsible for provisioning servers
to these three data centers, and they provision an average of 200 new servers each
month.
Three levels of service are available to the owners of managed servers in the data
center, as shown in Table 2.
Table 2 Service Offerings for Managed Servers at Microsoft
|
Level |
Servers |
Description |
|
One |
~700 |
Power, cooling, and network taps. |
|
Two |
~2,000 |
Power, cooling, and network taps.
Data backup support.
Reactive support, whereby the server owner calls Help Desk when the server is not
operating properly, and then Data Center Operations (Ops) is notified and takes
action.
|
|
Three |
~6,000 |
Power, cooling, and network taps.
Data backup support.
Full management: proactive support of the server hardware and the operating system
through Microsoft Operations Manager (MOM), including monitoring for update compliance,
as well as automated server provisioning with up-to-date security updates.
|
Full management (service level three) is critical for application servers that run
the business and the core infrastructure, such as file and print servers; proxy
servers; remote access servers; and servers that run Active Directory, DNS, and
WINS. Service level two is often chosen for test, support, and product development
lab servers. Regardless of the service level chosen, each server owner is responsible
for managing and maintaining servers at the application level and higher—for
example, user rights.
At Microsoft, the server owners must use approved versions of server software and
the latest updates. The server owners must also use hardware that is manufactured
by approved vendors according to specifications. This standardization keeps costs
down, improves reliability of the platforms, increases service availability, and
supports centralized and remote monitoring and management.
Operations Model
When a problem with a fully managed server is identified at Microsoft, the problem
is escalated as follows:
- Tier 1, Help Desk. Most application server issues are detected
at Tier 2. However, if the server owner or an application user identifies the problem,
he or she contacts Help Desk. Hardware issues for regional servers are handled locally.
Regional helpdesk technicians perform any needed hands-on server operations and
provide first-line support to the user community in their native language.
- Tier 2, Data Center Operations. Ops uses Microsoft Operations
Manager alerts to proactively monitor servers for problems so that many problem
reports circumvent Help Desk. However, if the server owner identifies the problem
and contacts Help Desk, Help Desk then contacts the Ops or messaging operations
team for further action. When Ops is alerted, it handles the initial response, spending
a defined amount of time (such as 15 minutes) on the issue by using an internally
developed Troubleshooting Guide (TSG). Ops also initiates the trouble ticket in
an internally developed ticketing system. This system integrates the ticket tracking
application with several knowledge management functions, such as the product group
knowledge base, TSGs, and other internal resources. The ticketing system is used
to manage the incident life cycle from detection and recording to investigation,
diagnosis, and resolution.
TSGs are created when an issue is common and the resolution is known and easily
implemented. If Ops cannot resolve the issue, it escalates according to instructions
in each TSG for investigation and root-cause resolution. TSGs are linked to alerts
in the custom ticketing application.
- Tier 3, Infrastructure Support (IS) and Advanced Diagnostics and Debug (ADD)
teams. Depending on the nature of the problem, Ops can contact
either IS or ADD. IS provides end-to-end measurement and management of core infrastructure
services. ADD specializes in debugging Microsoft Windows® operating system issues
and communicates directly with the product development groups.
- Tier 4, Engineering. IS contacts Engineering if resolving
the problem involves modifying the IT architecture, hardware standards, or software
standards. For example, Engineering creates holistic standard platforms that facilitate
server provisioning and periodic maintenance updates so that all new and existing
servers are patched up to current standards.
Microsoft IT upgraded its desktop patch management infrastructure from Microsoft
SMS version 2.0 to SMS 2003. The purpose of this upgrade was twofold: to test the
upgrade to SMS 2003 in a production environment before release to the public, and
to deploy a comprehensive solution for providing relevant software and updates to
employee desktop computers quickly and cost-effectively.
Microsoft IT examined patching requirements at Microsoft and decided to create separate
SMS 2003 infrastructures: one to patch servers and one to patch desktop computers.
Microsoft IT based this decision on the following factors:
- The priorities for patching servers and desktop computers are different.
- There are more than 10 times as many desktop computers as servers, yet the desktop
computers depend on the reliability and accessibility of the servers to fully function.
- The corporate standard configuration (called the baseline) for servers at Microsoft
is much more uniform and stable than for desktop computers. In addition, the rate
of change in the server population is smaller than that for the desktop population,
though high compared to many enterprises. For example, around 5,000 desktop computers
per month are rebuilt or reimaged for development and testing purposes. Newly provisioned
or rebuilt servers in the production data centers average 200 per month. Existing
servers are baselined every six months to one year with a standard maintenance update
process.
- Early adopter product testing required Microsoft IT to implement both an upgrade
to the existing SMS 2.0 infrastructure for desktop patching and a "greenfield" deployment
to the new server patching infrastructure. The new deployment for server patching
also had to interoperate with existing internal server patching tools, server build
procedures, and change management procedures.
The SMS hierarchy for desktop computers consists of two distinct site server roles:
primary and secondary. A primary site server has a database running Microsoft SQL
Server™ 2000 Service Pack 3a (SP3a), but a secondary site server does not. To ensure
peak performance of security update and software update installation, the Microsoft
IT server patching hierarchy contains only primary site servers.
SMS sites are also described by their management relationship to other sites in
the hierarchy, as follows:
- Central site. The central site is the primary site at the
top of the hierarchy; it is the one site in a hierarchy that is not a child to any
other site. Therefore, the SMS site database at the central site acquires the data
of the entire hierarchy. The central SMS site database stores the inventory information
for the central site and all of its subsites.
- Parent site. A parent site is a primary site that includes
at least one other site beneath it in an SMS hierarchy. Only a primary site can
have child sites.
- Child site. A child site is a site that reports to a site
above it in the hierarchy. SMS copies all information collected at a child site
to the parent site, which in turn reports all of the accumulated data to its parent
site. The Microsoft IT design has no child sites, so none of its primary site servers
constitute parent sites.
Figure 1 shows the SMS 2003 server architecture at Microsoft for server patch management.
.gif)
Figure 1 SMS server patching architecture at Microsoft
Inside Microsoft, the SLA requirement for installation of critical security updates
for servers is less than 24 hours. The SLA for noncritical updates for servers is
less than seven days.
The server patching design used inside Microsoft uses only two server tiers. The
design drivers are:
- The primary sites correspond with the underlying network infrastructure bandwidth.
- Servers must be patched in the shortest possible time.
However, Microsoft IT recommends that organizations examine all network performance
implications when designing an SMS infrastructure in a bandwidth-constrained environment.
The servers in the Microsoft IT design are four-CPU, 1.5-gigahertz (GHz) computers
with 2 gigabytes (GB) of random access memory (RAM), a 34-GB hard disk drive for
SMS, and a 34-GB hard disk drive for the database.
Server Patch Management Process
Patching thousands of servers is a cross-team cooperative effort that involves the
following specialized teams and roles at Microsoft:
- Microsoft Response Center (MSRC). The MSRC investigates
product issues that are reported directly to Microsoft, as well as issues discussed
in certain popular security newsgroups. The MSRC releases security bulletins about
vulnerabilities.
- Corporate Security Compliance. The Corporate Security team
(within Microsoft IT) reviews the bulletins posted by the MSRC and assigns a deployment
priority to them according to internal security needs. For example, the threat may
not be as imminent to Microsoft because of Microsoft's implementation of previous
updates or of protective firewalls. The security analyst within Corporate Security
recommends the enforcement dates for updates and facilitates the flow of information
between the Corporate Security and Data Center Operations organizations.
- Data Center Operations. The Ops team manages the data centers
and hosts the SMS infrastructures. The SMS patch administrator within Ops creates
and prepares the update deployment package and distributes the update according
to the recommended target computers and enforcement date. The patch administrator
uses the SMS 2003 Distribution Software Update Wizard to ensure that the right updates
reach the right computers within the prescribed time. The patch administrator adds
the updates to the next scheduled software distribution, authorizes the update,
sets the update properties to have the proper enforcement date (according to the
severity), and sends the update to the targeted groups of servers for distribution.
Microsoft IT follows two schedules for deploying updates to servers:
- Emergency updates that have a rating of Critical are deployed to servers in less
than 24 hours.
- Updates of any lower rating, known as standard updates, are deployed monthly.
In both cases, Microsoft IT uses automatic patch enforcement in SMS 2003. If the
local server administrator does not install the update within the allotted time,
SMS automatically installs the update after the time elapses and forces a restart
on the server to effect the changes.
Note
There are a few mission-critical servers
that Microsoft IT has exempted from forced compliance because the server administrator
must have complete control over when the server restarts. In these cases, however,
the administrator is still expected to perform any critical security update within
24 hours of deployment. If the administrator has not performed the patch by the
deadline, SMS automatically enforces the patch.
Figure 2 shows the overall process flow for server patching on the standard timeline,
the teams that perform each step, and the timeline. The same teams follow the same
steps for a critical update, but on a compressed timeline.
.gif)
Figure 2 Server patching process flow at Microsoft
Note
There is no change management decision
point after analysis and before testing. For server security updates, the decision
to deploy is preapproved and the process focus decision is how fast to deploy the
updates.
Phase 1: Monitoring for Security Bulletins and Updates from Microsoft
The release of a security bulletin from the MSRC kicks off the process. MSRC bulletins
include details that describe vulnerabilities and the products that they affect.
The bulletins also include detailed technical information describing vulnerabilities,
updates, and workarounds, in addition to deployment considerations and download
instructions for any available updates.
Security bulletins and related security updates are released monthly between 10:00
A.M. and 11:00 A.M. Pacific Time, unless Microsoft determines that customers will
be better served by releasing a security bulletin at a different time. This policy
was established in response to international customer feedback, with the purpose
of better enabling customers to plan and schedule patch management activities. Although
the Microsoft IT team is involved in the development of the update before the bulletin
is released, the process of deploying the update to all servers begins after the
update is released, just like other Microsoft customers.
All security bulletins and other information about Microsoft product security are
available at http://www.microsoft.com/technet/security/default.mspx.
All security updates included in the last two service packs for all currently supported
products are available for download from this location.
The Microsoft Response Center offers update notification through e-mail subscription
services. For customers who have more extensive knowledge of or interest in the
technology behind security updates, Microsoft TechNet offers the Microsoft Security
Notification Service, a free e-mail notification service. These e-mail messages
are geared toward IT professionals and contain in-depth technical information. For
more information or to sign up to receive the Microsoft Security Notification Service,
see http://www.microsoft.com/technet/security/bulletin/notify.mspx.
This is the method the Corporate Security Team uses to stay apprised of update notifications.
Phase 2: Determining the Risk Level
The MSRC rates the urgency of updates according to the severity ratings shown in
Table 3.
Table 3 MSRC Update Rating
|
Rating |
Definition |
|
Critical |
A vulnerability whose exploitation could allow the propagation of an Internet worm
without user action. |
|
Important |
A vulnerability whose exploitation could result in compromise of the confidentiality,
integrity, or availability of users' data, or of the integrity or availability of
processing resources. |
|
Moderate |
Exploitability can be mitigated to a significant degree by factors such as default
configuration, auditing, or difficulty of exploitation. |
|
Low |
A vulnerability whose exploitation is extremely difficult or whose impact is minimal. |
The Corporate Security team adapts the MSRC Maximum Severity Rating System to determine
the internal deployment schedule. Emergency critical updates are sent out on a "zero
day" schedule that seeks to update all servers within 24 hours. Updates rated Important,
Moderate, or Low are all added to the standard monthly automatic update distribution.
Automated Scanning for and Reporting of Vulnerable Computers
After the deployment timeline is set, Microsoft IT must determine which servers
are or might be vulnerable.
The Security Update Inventory Tool provided in the SMS Software Update Services
Feature Pack extends the SMS hardware inventory to report on the security updates
that are missing on a set of SMS clients (servers). Clients compare what has been
installed against the list of available updates contained in the Extensible Markup
Language (XML) file downloaded from the Microsoft Security website.
The Security Update Inventory Tool also includes the Microsoft Baseline Security
Analyzer (MBSA), version 1.2 as of this writing, as the update scanning engine.
Microsoft IT uses MBSA to scan for missing and installed updates on local and remote
computers. MBSA can also scan all the computers in a given domain for missing and
installed updates. Microsoft IT uses the results of these scans to identify and
discover missing updates.
For updates that are not available or are not supported by MBSA, Microsoft IT uses
the standard software distribution feature in SMS to deploy the update packages
to servers.
Phase 3: Testing
No matter how much testing is performed, rolling out an update into production sometimes
produces effects that can never be replicated in a lab or test environment. To avoid
negative impacts on a large number of servers, Microsoft IT can create a reference
collection within SMS 2003 that contains a representative sample of all permutations
of the production servers.
There are two primary forms of testing for successful update deployment:
- Test the production environment's implementation of update distribution in terms
of connectivity and reporting
- Test the update itself for installation on computers representative of the production
environment
To test the SMS 2003 environment, the inventory, the collections, and connectivity,
Microsoft IT deploys a synthetic patch to one or more collections within the production
environment. A synthetic patch is an inert payload. When an administrator deploys
a distribution package that contains only a synthetic patch, it generates all the
SMS 2003 reports and status messages that indicate how successfully the patch
reached all targeted computers. Yet, the synthetic patch has no impact on the production
environment; it does not alter the target computers or force any restarts.
Phases 4-7: Deploying the Patch
Standard Critical Update Deployment
For deployment of standard updates, 24 work periods or maintenance periods of four
hours each are scheduled over the course of four days. Data-center server owners
map each of their servers to a specific maintenance period within the overall work
schedule in an internal change management database called IT Configuration. The
SMS 2003 patch administrator then uses these groupings of servers to target delivery
of the update, so that any restarts required are within the approved maintenance
time period. Note that Microsoft IT recommends that clustered server owners place
each server in the cluster into different maintenance periods, so that the entire
cluster is not patched at the same time.
Figure 3 shows the work-period breakdown for an example four-day patching timeline.
.gif)
Figure 3 Twenty-four work periods for standard deployment
Emergency Critical Update Deployment
For emergency critical updates, there are only four work periods of one hour each.
Each emergency work period maps to an entire day in the standard timetable. After
the four-hour timetable, Microsoft IT allots an additional three hours to check
the success rate of the installation and ensure that the remaining servers are patched
within seven hours.
Figure 4 shows how the four-hour critical update deployment periods overlay the
standard work-period breakdown for the example four-day patching timeline.
.gif)
Figure 4 Four work periods for emergency deployment
For example, if the server administrator has mapped the server into the IT Configuration
database with a maintenance period of Saturday at 4:00 A.M. for the standard deployment
timeline, the server is patched on the second Saturday of the month beginning at
4:00 A.M. However, the same server will be patched in the Hour 3 maintenance period
on an emergency deployment timeline.
Phase 8: Reporting
There are a number of tools that a patch administrator can use to check the status
messages returned from SMS clients after deploying a package by advertising. The
patch administrator can use the Advertisement Status Viewer to ascertain:
- The number of clients in the collection that have not received the advertisement.
- The number of clients that have received the advertisement but have not run it.
- The number of clients that have run the program unsuccessfully.
To check that the Update Install Program is running successfully on clients, the
patch administrator analyzes the status messages to determine when the program was
last successfully run on each client. The patch administrator investigates any delays,
which can occur if, for example, an SMS client is not turned on or is not functioning
correctly.
Status messages also record the degree of voluntary versus enforced patching, and
how the server administrators are managing restarts and scheduled installation.
Based on this data, the patch administrator adjusts enforcement and default settings
for the next round of patches to bring computers into compliance more efficiently.
For follow-up, the patch administrator uses the Compliance by Software ID report
in SMS to obtain a summary of the total number of systems for which an update is
installed and missing, as well as the status relating to update distribution. This
report helps identify the current compliance levels for a particular update across
the production environment.
Figure 5 shows the cost and server impact of the timetable for update deployments
from notification through final enforcement and follow-up. The figure also shows
the escalating potential impact to server operations.
.gif)
Figure 5 Cost/impact over time
Lessons Learned and Best Practices
A number of lessons learned and best practices arose from Microsoft IT's experience
with the implementation of an SMS 2003-based server patch management, as described
in the following sections.
Establish a Change Advisory Board
Microsoft IT recommends the formation of a change advisory board (CAB) composed
of representatives from areas of the business that would be affected by the security
threat or the installation of a software update. CAB members should include individuals
who have experience in the specific technologies and services that will be used
to deploy the update, in addition to representatives from the business, network,
security, service desk, and technical support teams.
The CAB should form an emergency committee whose task would be to quickly authorize
critical updates—those designed to close security vulnerabilities or avoid
critical system failures. The emergency committee should be composed of people with
the right background and operational authority to approve emergency changes, and
who are available to make quick decisions.
To Control Planned Downtime, Use a Change Control Database
The Microsoft IT server administrators use an internally developed change control
database (IT Configuration) to designate a specific maintenance period in the timeline
for their servers. This system ensures that any downtime from a necessary restart
is minimized, and it enables the local server administrator to plan ahead to work
around any issues. The time period selected differs by server, geography, and business
needs.
Targeting updates by using SMS for distribution to servers according to the designated
maintenance periods minimizes service disruption. Administrators of clustered servers
should place each cluster node in separate maintenance periods in the database,
to avoid negative impacts on the entire cluster.
Streamline the SMS 2003 Installation
To make SMS 2003 easier to administer and run faster, Microsoft IT recommends that
the SMS installation have only the features enabled that the administration team
will use. This kind of installation not only makes the local SMS administration
simpler to run and faster, it has a significant impact on the bandwidth requirements
throughout the production environment, including network traffic volume, memory,
and storage requirements for site servers, distribution points, and management points.
Aggressively Monitor and Manage SMS Client
Aggressively monitor and manage your clients. A server without a healthy SMS advanced
client cannot be patched by SMS. Even the servers that are marked as exceptions
to the regular patch management process at Microsoft have the SMS client installed
and report status. However, at any given moment in time, there will be some number
of clients that are not reporting status. You should investigate and resolve these
issues. For example, at Microsoft the most frequently occurring reason for a missing
client status report is that the server was momentarily unavailable to a network
PING request at the time the report was run.
Suspend Monitoring During Patching
Microsoft IT uses MOM to monitor servers. To suppress thousands of unnecessary event
alerts, MOM monitoring is turned off immediately prior to patching, and re-enabled
after patching.
Make Status Self-Serve
To enable local server administrators to check the patching status of their servers,
Microsoft IT built an internal website and tool called Serverpmstatus. After the
server administrator provides valid authentication credentials, the website queries
the database used for change and configuration management and returns a list of
servers and status for which the administrator has Owner, Authorizer, or Notify
permissions. If the status shows that the server is vulnerable, the administrator
can manually patch the server or wait for the automated patch to be applied during
the work period defined in the database.
Status on the Serverpmstatus website is updated every four hours. Figure 6 shows
an example screen shot of the tool.
.gif)
Figure 6 Example patching status report
This example shows that the internal Corporate Security scans and SMS vulnerability
scans are staggered for maximum effectiveness. The listing for the MS03-051 patch
appears as missing in the SMS scan because it is not supported by MBSA 1.0. Security
updates that are not supported by MBSA 1.0 are still patched during the defined
maintenance period.
Communicate the Rollout Schedule to the Organization
The patch administrator should send a clear and easily identifiable e-mail message
to server administrators, informing them about the update and providing information
about how to install it. This mail should be flagged for follow-up to remind administrators
of the actions they need to take.
Assign Software Distribution Points
After the package has been imported into SMS, Microsoft IT decides which distribution
points should be used to make the update available.
In general, updates will be deployed to the same groups of servers each time, and
so the same distribution points will be used for each update. For example, updates
for servers should go to all distribution points in the data center's site. The
SMS patch administrator can then set up distribution point groups that contain only
distribution points for particular ranges of servers. Use of distribution point
groups expedites the process of assigning distribution points to updates being deployed.
The SMS patch administrator should use the inventory information within the SMS
database to identify where new distribution points are needed. Note, however, that
simply adding a distribution point to a distribution point group does not cause
the package to be sent to the new distribution point, even if the Update Distribution
Points option is used. New distribution points should be added one by one
to the package, and then the distribution points can be updated.
Stage Updates on Distribution Points
After the appropriate distribution points have been assigned, administrators should
ensure that copies of all the individual files are distributed to these servers.
Use the SMS status system to monitor the progress of distribution of the update
files.
Monitor Bandwidth When Sending Updates Between SMS Sites
For normal update distribution, to avoid overloading your network, limit either
the amount of network bandwidth used or the times of day that transmission can occur
when sending instructions, software packages, and advertisements between sites.
SMS enables an enterprise to define package priority as High, Medium, or Low. Microsoft
IT reserves the High priority for critical updates only.
For emergency updates, Microsoft IT lifts all intersite restrictions and allows
updates to be sent to other sites as quickly as possible. If network links between
sites are slow or are already congested, lifting restrictions on the intersite sender
has no effect. In these cases, Microsoft IT considers sending the update to each
site by using the SMS courier sender.
Select Deployment Groups
When administrators use the Distribution Software Update Wizard to distribute a
new update, they do not have to target computers precisely. The wizard deploys a
smart agent to the client, which is invoked when a new update is to be installed.
This agent automatically handles whether an update advertised through the wizard
is applicable to that computer and whether it has already been installed. It also
handles chaining multiple updates and the restarts needed to make the update current.
If administrators do not use the Distribution Software Update Wizard, but the updates
are being distributed through a custom package and collection, they create a distribution
list by creating one or more SMS queries.
Advertise the Update to Client Computers
When administrators use the Distribution Software Update Wizard, a repeating advertisement
is automatically created to run the update installation agent on computers in the
target collection. The repeat interval can be altered from the default of seven
days, as appropriate for the collection. If different schedules are needed for different
types of computers, multiple advertisements can be created for multiple collections,
through the same package and program. If the repeat interval for the running of
the update installation agent is set to daily and it needs to be run sooner for
the rollout of a critical update, a new, one-time, mandatory assignment should be
made for the advertisement to run as soon as possible.
Sometimes, Microsoft IT must distribute updates through a custom package and collection,
such as the January 2004 Microsoft Data Access Components (MDAC) Security Update
832483.
Test the Impact of the Update
No matter how much testing is performed, rolling out an update into production often
produces effects that can never be replicated in a lab or test environment. To avoid
negative impacts on a large number of servers, create a reference collection within
SMS 2003 that contains a representative sample of all permutations of the organization's
servers. This is an efficient way to test whether the update will be successfully
installed on all platforms in the organization.
Initially, test the update deployment's basic functionality. Then, gradually add
levels of complexity at each successive stage. Document the results at the completion
of each test phase and verify the findings against the project requirements. Investigate
and resolve any problems before moving forward.
Model the test lab on the organization's production environment. If the organization
uses standard client and server hardware configurations, use these configurations
in the lab. As far as possible, use the same hardware, software, network, logon
scripts, and other technologies used in the production environment. If the production
environment includes computers with nearly full disks, obsolete and possibly unused
software, or an assortment of different network adapter cards, install some lab
computers with the same characteristics. If routers or slow links connect production
networks, duplicate these conditions in the lab. Some organizations use server backups
restored to unused or outdated server hardware for this purpose.
Deploy updates in timed phases to avoid stressing the entire network's bandwidth.
In general, deploy updates by time zone to match the off-peak usage of the network
so that more bandwidth is available for distributing the updates.
If a server has an absolutely essential function and peak demand is not regular
or predictable, consider exempting that server from a forced update. There are certain
servers at Microsoft that have such an exemption. The administrators are still obligated
to update within the deployment period, but the SMS distribution does not force
an update on those servers at the end of the period.
For most cases, however, the following practices particularly minimize impacts to
servers:
- Use the persistent icon on all deployments.
- Base the period of forced update installation on the servers' periods of low demand.
- Use an automatic, periodic distribution that bundles several updates together.
Most servers have peak and off-peak periods of activity. When setting up times for
forced patch installation, set the forced update time to coincide with the server's
off-peak hours. Information about off-peak hours should be available in the change
management database.
Establish Enforcement Policy
Microsoft IT's policy is to require the deployment of emergency critical security
updates within 24 hours and standard critical security updates monthly. If the local
server administrator does not comply within that time period, the SMS distribution
program automatically updates the server and restarts it to effect the changes.
This process provides administrators reasonable time to coordinate the restart with
business needs, but also ensures that the update is installed. On average, around
2% of all managed server's owners patch themselves before the deadline.
If a server is not brought into compliance within the compliance window for a security
updates, a Microsoft IT administrator disables that server's network port. Although
this action shuts down all throughput related to that server, it is preferable to
that server propagating a virus or worm to its clients and beyond. The server administrator
then contacts helpdesk to start the process that ensures the update is installed,
the server is restarted, and the port is re-enabled as soon as possible.
Plan Disaster Recovery
The Microsoft IT SMS implementation consists of dedicated stand-alone SMS infrastructure
in addition to the SMS service running on key infrastructure platforms. Disaster
recovery steps need to account for this implementation, because other services on
an infrastructure platform may need to be reinstalled and reconfigured prior to
SMS 2003 Advanced Client installation. Microsoft IT handles disaster recovery by
using the automated server build process and tools, but customers without an automated
process should ensure that software is restored in the proper order. For example,
if some of your secondary sites reside on domain controllers in regional tail sites,
the domain controller should be restored first, and then the SMS site should be
reinstalled. The disaster recovery steps for dependent services should be documented
in the business continuance plans.
Implement the SMS 2003 Advanced Client Throughout the Enterprise
The Advanced Client supports several features in SMS 2003 that provide advantages
in patch management, including:
- Ability to implement Advanced Security.
- Compatibility with Active Directory, which makes inventorying and path management
quicker and more reliable. Implementing Advanced Security is a prerequisite for
using Active Directory with schema extensions in SMS 2003.
- More automated update deployment. Legacy clients require manual installation or
script installation and are limited in their ability to find paths to source software
required for some update installations.
- Improved ability to generate reports and status messages (compared with legacy clients),
making the administrator's tasks of gathering patch status and metrics easier.
- Legacy clients do not support the MBSA version included with SMS 2003, which enables
administrators to ascertain software revision levels on all Advanced Client servers
and desktop computers throughout the network.
Create the Appropriate Positions and Teams
Microsoft IT recommends that the IT department of a large enterprise should have
an administrator dedicated solely to patch management. Patch management requires:
- Complete knowledge of the update information available.
- Processes to assess, configure, test, and deploy updates.
- The ability to interact with security departments and coordinators.
- Thorough knowledge of the IT infrastructure.
- Mastery of SMS 2003 tools.
- Knowledge and time to test update deployments.
- Examination of reports and status messages to ascertain success and troubleshoot
failure.
In larger organizations, another person or committee should evaluate and prioritize
updates regarding urgency and applicability to the enterprise's IT infrastructure.
The patch management team in an enterprise should possess the following minimum
levels of certification and skills:
- Microsoft Certified Professional (MCP) or Microsoft Certified Systems Engineer (MCSE)
in SMS, Microsoft Windows 2000 Server, and Windows 2000 Active Directory.
- IT Infrastructure Library (ITIL) foundation certificate, ideally holding an ITIL
master's certificate.
- Familiarity with the key issues and technology around patch management and software
distribution.
- Training in MOF concepts and principles.
- Experience in managing and delivering complex process-based and technology-based
projects.
The enterprise may also need to formulate its own exercises to ensure that patch
administrators know how to respond to an update notification, create inventory collections,
configure update deployments, create a reference collection, deploy a synthetic
update and evaluate the results, and set up automated functions such as automatic
deployment schedules and enforced installation. Depending on the solution architecture
and configuration, patch administrators may need to know how to write installation
scripts and manually interact with update installation processes.
In addition, generalized operations training may be accomplished through courseware
developed by Microsoft and delivered through a variety of vendors. Applicable courses
include Microsoft Operations Framework Essentials, Microsoft Operations Framework
Changing Quadrant, and Managing a Microsoft Windows 2000 Network and Environment.
The structure of the patch management team should generally match the structure
described in the following sections.
Design Team
The design team creates a design document that outlines how to perform the activities
to be conducted during each phase of patch management. For example, the design document
should describe how the organization registers for, receives, and monitors information
about new updates. Patch management designs will vary according to the size and
complexity of the organization.
Project Team
The project team should review operations tasks in the design document to determine
which ones are required for the patch management solution being created. The project
team then interacts with the IT organization customer to assign responsibilities
to groups or to individuals to facilitate those tasks.
Test Team
The size and number of tasks assigned to the test team depends on a number of factors,
including:
- The size of the IT infrastructure.
- The diversity of programs installed.
- The diversity of connection types and speeds.
- The rigidity or variability of the software version baseline.
- The business priorities.
Someone with proven, in-depth technical skills should lead the team. Microsoft IT
recommends that the team include personnel who are responsible for supporting the
patch management solution after it is deployed. The team as a whole should have
a good understanding of the business, the business objectives, and the reasons behind
the deployment.
Patch Project Manager
The update project manager investigates the update and reads the relevant Knowledge
Base article to understand what the update fixes. The update project manager must
consider whether an update is specific only for particular scenarios or configurations.
It might be necessary to build SMS queries and Web reports to obtain this information.
For example, if the Knowledge Base article states that the update is required only
for computers with a certain processor or minimum amount of memory, the reviewer
may need to create a query or Web report to determine whether the update is applicable
to any computers in the enterprise.
Alternatively, if the update in question is a critical security update, the update
project manager may need to forward the update information to be applied in a preventative
manner to forestall any future vulnerability.
Patch Administrator
The patch administrator should:
- Develop a plan for making the required changes.
- Determine and obtain the resources required.
- Arrange for the development of any necessary scripts, tools, and documentation that
will be necessary to deploy the changes.
- Ensure that adequate testing is carried out.
- Ensure that the changes are deployed into production.
- Assess the success or failure of deployment.
A designated patch administrator ensures that necessary collecting, configuring,
testing, patching, and reporting steps are all followed so that no systems are left
unprotected within the production environment. SMS 2003 includes several predefined
patch-related reports for discovering missing updates. They display such information
as applicable security updates and installation status for a specified update. For
example, the "Count of Applicable Software Updates by Type" report identifies updates
that are relevant to computers in the production environment but that have not yet
been installed.
Identify Computers That Were off the Network
Enforcing security policy can be complicated by distributed administration and vulnerable
assets that are not centrally managed. Those responsible for administering the asset
and resolving the vulnerability may be unknown or hard to find, may physically reside
within another department in the organization, or may not have the necessary skills
to resolve the vulnerability on their own.
Baseline the Environment
A baseline is a set of documented standard configurations of a product or system
that is established at a specific point in time. Baselines establish a standard
that systems of the same class and category are required to match. Effective IT
operations use baselines as a trusted point from which systems are built and deployed.
Microsoft IT updates server baselines twice a year. The configuration defined by
a baseline should be stringently tested and hardware vendor certified.
Baselining requires an accurate inventory of the computers and services within the
production environment, including all software that is required by different types
of systems or server roles, such as printer server, domain controller, or messaging
server. The baselines include operating system version and application versions
plus all required software updates. A number of baselines might be required, depending
on the variation of hardware, software, and business organization.
Baselines for servers should be as simple as possible, and rigidly enforced. At
Microsoft, the minimum server version level for most servers is Windows Server 2003.
Some servers are already running prerelease versions of the next major release of
the Windows Server operating system. This is a much narrower definition than is
allowed for desktop computers. This minimum level ensures that all servers throughout
the organization have also installed SMS 2003 Advanced Client.
Servers that fall below a release or update baseline should be addressed through
problem management and update planning and deployment. After an initial baseline
inventory scan, bring the sub-baseline hosts up to baseline compliance. During subsequent
cycles of inventory, if the host repeatedly falls below the baseline, forward information
about that incident to the problem management organization for further investigation.
This symptom may indicate a system that has issues regarding distribution, schedules,
or permissions, or that may require special care through exception handling.
At Microsoft, servers that exceed the upper limit of the baseline range are not
automatically exempt form patch management. Systems that exceed the approved baseline
contain application versions or updates that have not been interoperability tested
and formally approved by Data Center Operations and Corporate Security. This fact
is something to consider when a server is running beta or prerelease software, which
is a constant factor at Microsoft.
Check computers that exceed their class baseline to determine whether unauthorized
changes have occurred. In some cases, an above-baseline system may need to be returned
to a trusted level or exempted from update deployments until the corporate baseline
has risen to match it.
Consolidate Updates into Service Packs
After a number of updates accumulate for a software version, Microsoft rolls these
fixes into a service pack. The service pack contains all the software updates and
security updates that have been issued up to the release of the service pack.
The patch administrator should become familiar with the correlation between a service
pack and the updates that it contains. This familiarity can streamline the process
of installing the proper baseline on a computer newly brought online. This way,
a new computer might be installed with the original software plus one or two service
packs, instead of retrieving and installing a large number of updates to bring the
platform up to the corporate baseline.
Continually Improve the Process
Staffing is as important as the features of SMS 2003, the implementation, the server
hierarchy, and how the patch management process is defined. To arrive at the best
process, implement a process change structure and follow it to make and track changes
to the standard process until it reaches a satisfactory level of performance and
the desired percentage of automated update installation and compliance.
For example, because Microsoft releases updates on a monthly schedule, Microsoft
IT must plan ahead and resource appropriately. Figure 7 shows an example of the
work breakdown structure for regular monthly patch management activities so that
proper staffing levels can be arranged in advance.
.gif)
Figure 7 Sample monthly patch management work breakdown
structure
In addition, Microsoft IT operates on the principle that "you cannot improve what
you do not measure." To improve your process over time, gather performance statistics.
Some suggested metrics to collect are shown in Table 4.
Table 4 Sample Patch Management Metrics
|
Measurement |
Example |
Trend |
Actions to take |
|
Patching activity |
five per month |
N/A |
Baseline for comparison. |
|
Ratio of rejected patch Requests for Comments (RFCs) |
one out of six |
.gif) |
Document RFC completion requirements.
Educate staff on RFC completion requirements.
Enforce RFC completion through Change Log tool.
|
|
Ratio of emergency patches |
one out of four |
.gif) |
Implement mitigation strategies and tactics to reduce attack surface. |
|
Patch success ratio (per patch) |
97% |
.gif) |
Systematically document and incorporate failure modes into testing scheme. |
|
Number of support incidents (per patch) |
Nine |
.gif) |
Produce reusable workarounds.
Bring rogue systems into baseline compliance (upgrade, service pack, etc).
Provide self-help on website.
Push self-help to users in e-mail, voice mail, or other notification mechanism.
Better prepare and educate helpdesk.
|
|
Cost of downtime, productivity loss, or lost business transactions per update |
$25,000 |
.gif) |
Process improvements that lower this cost improves profitability.
Use this number to guide patching timelines.
|
|
Time from test success to 60% saturation deployment |
1: 75 hours
2: 12 days
3: 30 days
|
.gif) |
Circumvent network bandwidth and bottleneck issues.
Resolve policy and compliance issues.
Resolve notification failures or miscommunications.
Note Maintenance period changes for renegotiation.
Note Workload and cycles for capacity planning
purposes.
|
|
Identify time from 60% to 80% saturation deployment |
1: 25 hours
2: 10 days
3: 30 days
|
.gif) |
|
|
Identify time from 80% to 90% saturation deployment |
1: 10 days
2: 10 days
3: 30 days
|
.gif) |
|
For More Information
For additional information about how to deploy, operate, maintain, and support SMS,
visit http://www.microsoft.com/smserver/.
For details about Microsoft Solutions for Management (MSM) and the MOF, visit http://www.microsoft.com/technet/itsolutions/cits/mo/default.mspx
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
your local Microsoft subsidiary. To access information via the World Wide Web, go
to:
http://www.microsoft.com/
http://www.microsoft.com/technet/itsolutions/msit/default.mspx
For any questions, comments, or suggestions on this document, or to obtain additional
information about Microsoft IT Showcase, please send e-mail to: showcase@microsoft.com
The information contained in this document represents the current view of Microsoft
Corporation on the issues discussed as of the date of publication. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy
of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user.
Microsoft grants you the right to reproduce this White Paper, in whole or in part,
specifically and solely for the purpose of personal education.
Microsoft may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Microsoft, the furnishing
of this document does not give you any license to these patents, trademarks, copyrights,
or other intellectual property.
© 2004 Microsoft Corporation. All rights reserved.
Microsoft, Active Directory, Windows, and Windows Server are either registered trademarks
or trademarks of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks
of their respective owners.