Infrastructure Management at Microsoft
Technical Case Study
Published: August 22, 2006
|
Situation
|
Solution
|
Benefits
|
Products & Technologies
|
|
Microsoft IT operated a decentralized and inconsistent service support function
in managing IT infrastructure dispersed across the Microsoft global enterprise.
This decentralized support model led to operational inefficiencies, inconsistent
software configuration, duplication of efforts, and higher operating costs. Support
coverage issues were challenging to manage and were often reactive instead of proactive.
|
Microsoft IT consolidated and improved the service support by using a centralized
infrastructure management model that takes advantage of Microsoft System Center
technologies. Microsoft IT uses MOM 2005, SMS 2003, and Data Protection Manager
to automate system configurations, perform automatic software updates, proactively
monitor the IT environment, and back up and restore data on business-critical servers
hosted across the global enterprise. Microsoft IT also aligned its service support
model with Microsoft Operations Framework to deliver consistent and predictable
operational support.
|
- Consistent software configurations across Microsoft IT elements
- Significant reduction in operating costs
- Reduced complexity for managing IT services across the global enterprise
- Improved responsiveness, providing stable and predictable service support
to customers
|
- Microsoft Operations Manager 2005
- Microsoft Systems Management Server 2003
- Microsoft Data Protection Manager 2006
- Microsoft Office Business Scorecard Manager 2005
- Microsoft Operations Framework
- Microsoft Windows SharePoint Services
- Microsoft Office Live Communications Server 2005
|
This case study outlines how Microsoft Information Technology (Microsoft IT) centralized
its management of IT infrastructure to increase efficiency, reduce operating costs,
and provide a high quality of service to its customers. Microsoft IT embraces centralized
management through Microsoft® System Center technologies, application of the
Microsoft Operations Framework (MOF), and use of the Infrastructure Optimization
Model.
Enterprise businesses can benefit from centrally managed infrastructure support
teams by using System Center technologies to drive down costs and improve services.
As IT infrastructure management evolves, Microsoft IT uses these and other technologies
to manage its own corporate IT infrastructure from strategic locations.
Based on research published in April 2004, Gartner believes:
"The first step toward an infrastructure utility is in centralization and standardization.
The diversity and fragmentation of IT components must be brought under control with
a serious centralization and standardization effort."
Reference: "Gartner Introduces the Infrastructure Utility Maturity Model," April
23, 2004
This case study intends to give enterprise business decision makers, technical decision
makers, and IT professionals an insight into the experience of Microsoft in transitioning
infrastructure management to a centralized management solution based on Microsoft
System Center technologies and the MOF.
Situation
Microsoft IT drives IT strategies and delivers the global IT infrastructure support
services for Microsoft Corporation. In addition, Microsoft IT directly participates
in the early adoption of new technology and in many cases deploys prerelease versions
of Microsoft products into the production environment. In this way, Microsoft IT
gives critical feedback to the product groups, providing valuable guidance for product
improvements and for sharing best practices with Microsoft customers worldwide.
Microsoft IT makes this possible by acting on the following key objectives:
-
Being the "first and best customer" to the Microsoft product development groups
by providing "design for operations" feedback on key enterprise products.
-
Managing large-scale beta deployments of the abovementioned products while ensuring
minimal to no downtime to its supported business groups.
-
Running a world-class infrastructure utility (server, network, telephony) and providing
a scalable hosting platform for line-of-business applications.
In 2002, Microsoft IT began the Model Enterprise Initiative, which focused on improving
manageability, reducing complexity, and consolidating servers and data centers worldwide,
thereby significantly reducing operating costs. Microsoft IT analyzed network connectivity,
customer locations, and data-center management costs for each of its 24 data centers.
By the end of 2004, Microsoft IT had reduced the number of data centers to four
production centers and one disaster recovery center, hosting more than 10,000 servers
globally.
Microsoft IT then addressed an opportunity for further improvements through the
consolidation of decentralized infrastructure support teams, focusing on consistent
processes and efficiencies through use of System Center technologies and MOF processes
to support the IT infrastructure.
Business Challenges
Analysis of the existing structure and organization of the infrastructure support
teams identified the following business challenges:
-
Configuration changes. The teams were not consistent in their approaches
to configuration changes to servers or devices, which caused excessive administrative
overhead. Inconsistent change process contributed to a large number of support incidents,
and a lack of standardization in configurations increased resolution times and produced
inconsistent results. There was need for a central change process that would induce
consistency through the organization and that would in turn streamline and reduce
incident from change events. Microsoft IT also faced a challenge in maintaining
consistency in configuration management and standardizing configuration.
-
Software updates. Every six months, Microsoft IT releases an integrated rollup
software package for internal use; it includes critical updates, hotfixes, and application
and driver updates. This package does not replace the normal update process, but
rather augments it by ensuring that all servers are at a consistent level. However,
the worldwide operations teams did not apply this package in a consistent manner.
-
Knowledge gaps. In many cases, knowledge of key services resided with one
or two individuals who needed to disseminate information across the teams. Due to
ineffective cross-team knowledge sharing, Microsoft IT often found multiple teams
troubleshooting the same issue or issues that other teams had already solved.
-
Business impacts. Microsoft IT customers were not supported in a consistent
fashion across the globe for service requests or for initial responses or resolution
of issues. In some cases, delays of several hours occurred before a particular issue
received a communication or an action.
-
Collaboration. Due to the decentralization of the various operations teams,
collaboration efforts were not effective enough to support the global enterprise.
-
Cost of IT operations. Reducing the cost of IT operations while improving
the level of service is an ongoing goal for Microsoft IT. Historically, Microsoft
IT spent 50 percent of its IT budget on maintaining existing services and 50 percent
on upgrading or implementing new services.
Solution
Microsoft IT commissioned a review of the existing management tools, processes,
and procedures as part of a plan to implement a centralized management of infrastructure
service that would improve operational efficiency as well as customer experience.
The following figure illustrates criteria, useful during the review process, for
determining where an organization is and where it wants to be, known as the Infrastructure
Optimization Model.
.jpg)
If your browser does not support inline frames, click
here to view on a separate page.
Figure 1. Example infrastructure optimization review categories
For more information about how Microsoft leveraged the Infrastructure Optimization
Model, see Infrastructure Optimization at Microsoft at:
http://www.microsoft.com/technet/itsolutions/msit/operations/iotsb.mspx.
Following the review, Microsoft IT considered:
-
Centralized management tools and their implementation.
-
Centralized IT management processes.
-
Centralized IT management structure.
-
Strategic centralized management locations.
-
Transitioning process for centralized management.
Microsoft IT took great care in combining all of these areas into a cohesive solution
that had minimal impact on existing operations. The solution used a number of Microsoft
technologies.
System Center Technologies
To identify the features that would enable centralized management of the Microsoft
IT infrastructure, Microsoft IT reviewed System Center technologies, including Microsoft
Operations Manager (MOM) 2005, Microsoft Systems Management Server (SMS) 2003, and
Data Protection Manager (DPM) 2006. Microsoft IT also reviewed other Microsoft technologies,
such as Microsoft Windows® SharePoint® Services, Microsoft Office Business
Scorecard Manager 2005, and Microsoft Office Live Communications Server 2005. These
technologies also benefited from the reliability, manageability, and security of
Microsoft Windows Server® 2003 and the new cached e-mail features of Microsoft
Exchange Server 2003. The following sections outline the Microsoft technologies
used for the centralized infrastructure management solution.
Microsoft Operations Manager 2005
MOM 2005 provides Microsoft IT with centralized monitoring and, in some cases, automatic
problem resolution, for more than 10,000 managed servers and 10,000 network devices
on the corporate network. MOM 2005 provides Microsoft IT with numerous benefits,
including:
-
Event-driven operations monitoring.
-
Self-deploying and scalable management solutions.
-
Improved system availability, performance tracking, and problem resolution.
-
High levels of automation to lower the cost of monitoring Windows-based solutions.
-
Operational database, reporting database, and long-term trending database that provide
a wide range of detailed management reports.
The Microsoft product teams are responsible for developing MOM management packs
for every Microsoft server-based product. In addition to using these, Microsoft
IT uses third-party management packs to monitor network devices such as routers,
switches, and wireless access points. The centralized management teams use MOM consoles
to monitor the data-center servers and network devices. All events, regardless of
how they are collected on the back end, are forwarded to a centralized MOM console
in the network operation center (NOC). In some cases, MOM events trigger scripts
to alert teams or to automate response tasks.
MOM 2005 script automation and remote monitoring enable the centralized management
teams to identify potential problems and work proactively to minimize their impact
on business. The MOM management hierarchy uses MOM data warehousing to identify
trends and predict upgrade or resource requirements across all four production data
centers from a central management point. MOM management servers and MOM Web consoles
are ideal for central management, because they use low-bandwidth protocols. This
architecture prevents management data from saturating expensive wide area links.
The ticketing system that the support teams use is tightly integrated with the MOM
2005 connector framework.
For more information about how Microsoft deployed MOM 2005, see Deploying Microsoft
Operations Manager 2005 at Microsoft at:
http://www.microsoft.com/technet/itsolutions/msit/deploy/deploymom2005.mspx.
Systems Management Server 2003
Microsoft IT uses SMS 2003 to deploy software updates to servers that reside in
corporate data centers and to nearly every desktop computer in the corporate domains.
Software updates include security updates, hotfixes, drivers, and integrated rollup
packages.
Microsoft IT examined patching requirements at Microsoft and decided to operate
two separate SMS architectures, one for patching server updates and the other for
patching desktop and laptop computers. Microsoft IT based this decision on these
key factors:
-
Security updates are more critical for servers than for desktop computers because
servers affect the security and workflow of large groups of workers.
-
Microsoft IT determined that it could more easily meet the short time frame for
patching servers if it did not have to share the infrastructure for patching servers
with the resources and sustainer functions regularly running for managing desktop
computers.
-
The software platform baseline for servers at Microsoft is uniform and unilaterally
enforced, whereas desktop computers run a wide variety of software versions, applications,
and service pack levels.
SMS enables a centrally managed support team to automate the update process and
report on computer compliance with a relatively low number of maintenance personnel.
In addition, SMS enables Microsoft IT to deploy software updates in all four production
data centers consistently and within a set time limit. SMS inventory and reporting
enables Microsoft IT to know its assets and plan upgrades to suit customer requirements.
Microsoft IT uses the SMS 2003 Desired Configuration Monitoring tool, a powerful
solution to monitor configuration settings across all server roles and hardware
types for noncompliance. Administrators can define desired configuration models
with templates and enable SMS 2003 to proactively view noncompliance in the Windows
Management Instrumentation (WMI), Active Directory® directory service, Internet
Information Services (IIS) metabase, registry, and file system settings. The SMS
2003 solution sends alerts through MOM to the administrators when noncompliance
is detected from the predefined desired configuration. This helps in identifying
undesired configuration changes that might result in security breaches or service
disruptions.
For more information about SMS 2003, go to the Systems Management Server home page
at:
http://www.microsoft.com/smserver/default.mspx.
For more information about how Microsoft uses SMS for server security patch management,
see Server Security Patch Management at Microsoft at:
http://www.microsoft.com/technet/itsolutions/MSIT/Security/SMS03SPM.mspx.
For more information about how Microsoft uses SMS for desktop patch management,
see Systems Management Server 2003: Desktop Patch Management at Microsoft
at:
http://www.microsoft.com/technet/itsolutions/msit/deploy/smsdesktoptwp.mspx.
Data Protection Manager 2006
Microsoft requires the ability to protect and restore data centrally so that employees
in the field can concentrate on their core functions. Microsoft IT needed an alternative
to tape-based solutions for providing data protection and restoration services to
the company's 130 branch offices. As personnel, hardware, and software changed,
a need existed for constantly retraining staff at remote locations.
DPM 2006 augments traditional tape-based backups by using disk-to-disk copy. Microsoft
IT uses DPM to back up 130 branch offices, and it expects to save $1.1 million U.S.
in the first two years of deployment. DPM helps Microsoft IT provide a better service
in several ways:
-
User intervention. Local users do not need to remember to rotate the data
backup tapes into tape backup hardware.
-
Automated monitoring. Microsoft IT uses the DPM Management Pack for MOM 2005
to verify the success and health of the backed-up production servers. The management
pack gives the operators just-in-time alerts about issues that they need to fix
and has improved the monitoring team's efficiency by more than 300 percent.
-
Faster and more reliable restorations. DPM provides rapid and reliable recovery
of data lost because of user error or server hardware failure. End-user recovery
enables users to independently recover their own data by retrieving previous versions
of files through Windows Explorer or directly from Microsoft Office System applications.
-
Verification of backups. Engineers can easily verify the success of a backup.
-
Monitored backup process. Microsoft IT uses the DPM MOM management pack to
verify the success and health of the backup process.
For more information about how Microsoft uses Data Protection Manager, see Deploying
Data Protection Manager at Microsoft at:
http://www.microsoft.com/technet/itsolutions/msit/deploy/dpmtcs.mspx.
Strategic Centralized Management Locations
Microsoft IT identified two geographical locations to centrally manage its global
infrastructure remotely. Network Operations Centers (NOCs) in North America and
India provide support for North and South America, Europe, Middle East, Africa,
and the Asia Pacific regions during each location's core business hours. The infrastructure
runs 24 hours a day, seven days a week, and each location is configured as a business
continuity site to provide failover support.
To support this model, Microsoft IT implemented several Microsoft technologies.
The System Center technologies enabled remote management of data centers across
a wide area network. Deployment of Live Communications Server 2005 enabled real-time
communication and eliminated productivity delays between sites, teams, and groups.
Collaboration was enhanced through Windows SharePoint Services features such as
document version history, document check-in, and document check-out, for storing
technical support guides and process documents. Team members used features such
as Live Meeting, Meeting Workspace, and Document Workspace, available through the
integration of the Microsoft Office System, for knowledge management.
Microsoft Operations Framework
To better deal with IT challenges, Microsoft IT took advantage of the Microsoft
Operations Framework. MOF provides operational guidance that enables organizations
to achieve mission-critical system reliability, availability, supportability, and
manageability of Microsoft products and technologies.
MOF provides Microsoft IT with prescriptive guidance that enhances agility, reliability,
and efficiency for managing IT infrastructure. Microsoft IT uses MOF to improve
all aspects of IT management, from the implementation of a service to optimizing
it.
MOF has four quadrants: changing, operating, supporting, and optimizing. Figure
2 illustrates where key management processes fit into MOF.
.jpg)
If your browser does not support inline frames, click
here to view on a separate page.
Figure 2. Centralized management functions and teams in the MOF quadrants
Prioritization Grid
The support teams consist of both Microsoft employees and vendors. During its review
of IT services, Microsoft IT created a prioritization grid to determine how to organize
and delegate core business activities and support functions.
Microsoft IT determined that although most mission-critical tasks were not suitable
for delegating to vendors, many other non-critical areas of support were. Microsoft
IT then began the process of finding suitable vendors to take on this role. Microsoft
IT canvassed existing and new vendors, following the same formal procedure.
Project Plan
After creating a prioritization grid, Microsoft IT's next step was to evaluate and
mitigate the risks involved in remote management. Microsoft IT developed a project
plan that had three key stages.
Stage 1: Planning
The planning stage was the most intensive aspect of the process. Microsoft IT needed
to fully evaluate, deploy, and understand the technologies that perform remote management
tasks. In addition, Microsoft IT had to create a structure that would help ensure
that the two management sites performed at the same level for incident and change
management. The structure included processes, tools, and communication mechanisms.
The planning stage identified the required resources in terms of people, network
connectivity, management tools, and remote management processes.
Figure 3 shows the phases that Microsoft IT developed during the planning stage.
.jpg)
If your browser does not support inline frames, click
here to view on a separate page.
Figure 3. Phases of the planning stage
Stage 2: Transitioning
During the transitioning stage, Microsoft IT needed to identify and select the right
people for the appropriate roles, assess technical skill levels of all resources,
and (if necessary) provide appropriate training. To provide a consistent and equivalent
level of operations from all locations, the teams shadowed each other for several
weeks, using the same tools and processes. This activity helped identify weaknesses
in skills, processes, and procedures. It also helped with knowledge transfer and
identifying areas where documentation was poor. Microsoft IT was therefore able
to revise and improve communications, operational procedures, and documentation
for consistency for all support locations.
Prior to going live, Microsoft IT tested the support rollback process and the disaster
recovery process to ensure that the level of operations was sustained and customer
satisfaction was not affected during and after the transition. Microsoft IT, in
accordance with MOF prescriptive guidance, regularly tests the disaster recovery
process for business continuance.
Stage 3: Going Live
The going-live stage included the initial transition of some operations to the vendor
teams and observing the teams as they worked independently. Over a short period
of time, with constant assessments and evaluations, Microsoft IT quickly identified
improvement areas. Microsoft IT then developed and implemented plans to drive corrective
measures into systems, tools, and processes.
Centralized Management Support Structure
The existing Microsoft IT NOC consisted of three teams that used different processes
to perform a variety of technical support functions. These functions ranged from
basic, routine tasks to highly complex issues. The three teams were responsible
for the following activities:
-
Network connectivity, wireless access points, switches, and routers
-
Server configuration and monitoring
-
Telephony switches and hardware
One of the business challenges that Microsoft IT identified was a lack of standardized
processes and workflow efficiencies between these three operationally independent
teams. Several teams performed most incident response and change request work. Limited
ownership and a lack of collaboration resulted in inefficiencies, and the organization
barriers were a hindrance to the support structure. With interdependencies in technology,
there was also a need for knowledge convergence across support teams to troubleshoot
complex issues. Adhering to support service level agreements (SLAs) on complex issues
was a huge challenge that affected the business.
To facilitate a more efficient and consistent structure, Microsoft IT transformed
the NOC into the following support teams:
-
Change operations provided by a global change operations team. This team
is responsible for routine change management.
-
Incident operations provided by a global incident operations team. This team
is responsible for customer contact via a service desk team and routine incident
management.
-
Problem management provided by technical escalation teams. These teams are
responsible for problem management, creation process efficiencies, and the resolution
of complex incidents and chronic errors.
With this new structure in place, Microsoft IT used MOF service desk, incident management,
problem management, and change management best practices, coupled with MOM 2005
alerts and events, to respond to customer incidents and change requests.
Delegating support tasks between Microsoft IT and vendors enables Microsoft IT resources
to respond more quickly to complex incidents and change requests and focus on chronic
error resolution by using MOF best practices for problem management.
Figure 4 illustrates how Microsoft IT aligned technology teams with MOF roles.
.jpg)
If your browser does not support inline frames, click
here to view on a separate page.
Figure 4. How Microsoft IT aligned the technology teams with MOF roles
Service Desk
Microsoft IT currently uses a service desk team to create and assign tickets to
incident and problem management teams. The service desk team documents all ticket
activities, reports against SLAs, and collects metrics that can be used for problem
management analysis and investigations. The service desk team also builds a knowledge
database that supports repeatable process improvements and provides the Microsoft
product groups with valuable feedback.
Incident Management
The primary goal of the incident management team is to restore normal service operation
as quickly as possible and to minimize the adverse impact on business operations,
thus maintaining the best possible levels of service quality and availability.
Problem Management
The objective of the problem management team in Microsoft IT is to minimize the
adverse impact on the operational ability of a business due to incidents and problems
caused by errors within the IT infrastructure, and to prevent the recurrence of
incidents related to these errors. To achieve this goal, the problem management
team seeks to establish the root cause of incidents and then initiate actions to
improve or correct the situation.
Change Management
The Microsoft IT change management team provides a disciplined process for introducing
required changes into a complex IT environment with minimal disruption to ongoing
operations. The change management team is also closely aligned with the release
management process and manages the release and deployment of changes into the production
environment.
Service Level Management
Microsoft IT developed service level management in line with the requirements and
priorities of the services documented and offered in the service catalog for the
business, and the specific requirements of the negotiated SLAs. Microsoft IT uses
the monitoring of a service against the requirements in real time, and the reporting
and reviewing of key trends in historical data, to highlight and remove failures
that affect the level of performance of the service.
Business Benefits
Microsoft IT has realized several business benefits from the data center consolidation,
the transition to two centrally managed remote support locations, improved processes,
the deployment of System Center technologies, Windows SharePoint Services, and Live
Communications Server and by using the Infrastructure Optimization Model.
Consistent Configuration Across the Global Enterprise
In line with MOF, Microsoft IT sets specific and measurable targets for software
updates and configuration settings. From the transition to a single global remote
support infrastructure and the use of SMS 2003, all servers receive the latest updates
on a predetermined schedule and receive consistent configurations for their appropriate
roles. Consistent configuration reduces the vulnerability of computers, aids in
troubleshooting, and ultimately reduces operating costs. Having a consistent server
configuration is one of the biggest operational improvements gained from the transformation
to centralized management.
Improved Efficiency Through MOF Prescriptive Guidance
With the support team split between the two geographical locations, Microsoft IT
has improved the efficiency of its continuous support of the business. The integration
of MOM 2005 into the incident management workflow has yielded improved performance
and reduced ticket ratios. A faster response and improved incident resolution mechanism
have helped to increase the system uptime, leading to increased business productivity
and improvements in customer satisfaction.
Improved Disaster Recovery and Business Continuance
Microsoft IT now has a fully implemented and tested disaster recovery and failover
process for managing the infrastructure remotely. This process has strict guidelines
for how and when a failover should occur. By using MOM 2005, the remote management
teams can monitor and manage the disaster recovery data center from either remote
location. Using DPM and aligning the disaster recovery and failover support processes
with MOF have enabled Microsoft IT to improve business continuity and availability.
Reduced Operating Costs
The consolidation of data centers alone saved $100,000 in operational costs. Transitioning
to two remote support centers, the delegation of support tasks to both vendors and
Microsoft IT personnel, and the automation of repetitive tasks (such as software
updates, configuration updates, and in some cases, rectifying server issues) has
saved more than $700,000 per year.
For more information about how Microsoft measures IT performance, see Scorecards
Provide a Foundation for Business Performance Management at Microsoft at:
http://www.microsoft.com/technet/itsolutions/msit/deploy/scorecardbusperftcs.mspx.
Improved Business Performance Reports
By using Office Business Scorecard Manager, Microsoft IT generates IT scorecards,
which highlight key areas of service management SLAs to validate and track the efficiency
of all aspects of the co-location processes. The scorecards help Microsoft IT identify
areas for improvement. Microsoft IT also evaluates its service and reports against
SLA targets to the business through regularly updated reports and scheduled engagements.
For more information about how Microsoft IT uses Office Business Scorecard Manager,
see Scorecards Provide a Foundation for Business Performance Management at Microsoft
at:
http://www.microsoft.com/technet/itsolutions/msit/deploy/scorecardbusperftcs.mspx.
Summary
To centralize infrastructure management, Microsoft IT used Microsoft products and
technologies, such as MOM 2005, SMS 2003, DPM, Windows SharePoint Services, and
Live Communications Server. Microsoft IT also aligned the operational management
processes to the Microsoft Operations Framework and leveraged the Infrastructure
Optimization Model.
The centralized infrastructure management solution has realized the following:
-
Consistent software configuration across Microsoft IT
-
Significant reduction in operating costs by delegating repetitive tasks to vendors
-
Consolidation of operational management to two strategic global locations and reorganization
to two teamsImproved operational efficiency
-
Improved collaboration between locations and teams
-
Improved consistency and quality of the services provided to customers globally
For More Information
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
your local Microsoft subsidiary. To access information through the World Wide Web,
go to:
http://www.microsoft.com
http://www.microsoft.com/technet/itshowcase