In any digital transformation, technology and culture changes go hand-in-hand. Core Services Engineering (CSE, formerly Microsoft IT) has evolved from using a process-centered, rigid, manual operations model with a disconnected customer experience. We moved to a Microsoft Azure-based model that uses modern engineering principles such as scalability, agility, and self-service that are focused on the customer experience.
Microsoft embarked on a bold, three-step strategy to build best-in-class platforms and productivity services for the mobile-first, cloud-first world. This strategy harmonizes the interests of users, developers, and IT. To effectively deliver on the strategy, we needed to rethink our infrastructure and operations platforms, tools, engineering methods, and business processes to create a collaborative organization that can deliver cohesive and scalable solutions.
Our operations history
Like most IT organizations, our traditional hosting services were mostly physical, on-premises environments that consisted of servers, storage, and network devices. Most of the devices were owned and maintained for specific business functions. The technologies were very diverse and needed specialized skills to design, deploy, and run.
Traditional IT technologies, processes, and teams
Server technologies included discrete servers and densely built computing racks with blade servers. Storage technologies used direct-attached storage (DAS) and storage area networks (SANs). Networks used a variety of technologies, from simple switches to more advanced load balancers, encryption, and firewall devices. Platform technologies ranged from Windows, SQL Server, BizTalk, and SharePoint farms to third-party solutions such as SAP and other information security–related tool sets. Server virtualization evolved from Hyper-V to System Center Virtual Machine Manager and System Center Orchestrator.
To provide a stable infrastructure, we needed a structured framework, such as IT Infrastructure Library/Managed Object Format (ITIL/MOF). Policies, processes, and procedures in the framework helped to enforce, control, and prevent failures. Engineering groups that used hosting services had a similar adoption process for their application and service needs, based on ITIL/MOF and combined with a synchronous data link control (SDLC)/waterfall framework. Teams formed naturally around people with similar core strengths in the ITIL areas of service strategy, service design, service operations, and service transition, as shown in Figure 1.
Traditional hosted environments relied on external sources of space, power, connectivity, hardware, and software. And the technologies behind these sources evolved slowly. A common framework of policies and procedures helped bring teams together to refine and unify procedures. Tools were developed to formalize, track, audit, and measure procedures. The culture of the organization helped build a process-oriented, structured way of getting things done.
Challenges of traditional IT
Although ITIL/MOF helped streamline some processes, the complexities, constraints, and dependencies of traditional hosting prevented agile engineering. For example, it usually took six to nine months to build a new development environment for an application or service team. This time included planning, coordinating resources, tracking issues, and mitigating risk. Although the structure added clarity in delivery, it removed business agility.
Long-term managed services offered opportunities to build cost efficiency. But, because of the way processes were implemented, functional roles were often duplicated. This created an overall negative impact on time and cost.
When engineering teams used SDLC waterfall methods and operations teams used ITIL/MOF, adhering to process took priority over delivering iterative, agile solutions to meet targeted business needs. These processes slowed business throughput significantly. Solutions were developed and deployed over years instead of months.
Phase 1: Improving operational efficiency
CSE plays a pivotal role in the company’s new strategy, as most business processes in the company depend on us. To help Microsoft transform, we identified key focus areas to improve in the first phase of our transformation: improving business agility, reducing costs, learning new skills, and inventing new ways to work. Figure 2 shows the steps we took to get to Azure.
Infrastructure Platform. An agile business demands agile infrastructure, fewer physical servers, and moving to/innovating in Azure.
Strategy. Migrating to the cloud highlighted the need for build, change, and policy management processes as self-service capabilities. Our approach is to use software to automate provisioning, management, and coordination of services, so our Microsoft business partners can develop and deploy services faster with less work and lower cost.
Structure. We had to rethink the way that our teams and roles delivered this strategy by integrating different teams that did similar tasks. This allowed us to effectively design and deliver end-to-end service offerings at lower cost. Our organization was restructured to form teams that optimize service and infrastructure. These teams learn new skills, work harmoniously with engineering, and reduce waste.
Culture. We embraced a growth mindset, learned new skills, built new capabilities, and found new ways to work.
Mission. It became our mission to define, deliver, and transform how we work by helping engineers build solutions tailored to the hybrid cloud world.
Realigning our organization
Services optimization. This team helps our business partners to provision and manage their own IT services. We have improved operational agility and reliability, which has resulted in specific benefits:
- Less manual effort per release/update
- Shorter lead time
- More frequent builds and deployment
- Increased service quality
- Reduced security exposure
We elevated our teams by training people and hiring others with the engineering skills we need. Our goal is to gradually transition people from operational skills to service engineering skills.
A deeper analysis of our operational model also revealed redundant processes in service design, service transition, and service operations. After careful consideration, we reduced process overhead by eliminating or automating some processes. This restructuring presents a business opportunity to consolidate vendor teams. Many of our sustained workloads will decrease year over year, as on-premises infrastructure shrinks.
Infrastructure Optimization. This team eliminates duplicate infrastructure, reduces our footprint, and modernizes infrastructure for our business partners by reducing hosting costs. Key outcomes of this work include:
- Consolidated datacenters
- Fewer physical and traditional virtual machines
- Smaller storage consumption
- Increased cloud adoption
When teams started working together to optimize infrastructure, they found duplicate projects with similar goals. After we cut redundant projects, people were freed up to learn project management skills and to engage with our business partners.
This team took a program-based delivery approach with start and end dates. After provisioning was automated, we worked with our business partners so they could use new self-service tools to take ownership of their infrastructure. The new self-service features helped our business partners identify and decommission unused servers. Self-service planning eliminates manual handoffs, and enables our business partners to manage risks, issues, and blockers. Our business partners also found that they no longer needed vendors to manage hand-offs.
Reinventing our culture
To reinvent ourselves, we needed to change. We stopped managing processes and began trusting our business partners and empowering engineers. We defined our new mindset and goals to:
- Focus on the customer by designing and building new services from their perspective.
- Challenge and question the status quo, and rethink old processes and behaviors.
- Experiment and learn so we can produce innovative cloud technologies using agile methods.
- Collaborate beyond our organizational boundaries to identify and deliver the right solution for our business partners.
- Deliver faster and fix issues faster.
The business outcome
Combined, all the changes we made produced tangible results. We improved our agility and enabled our Microsoft business partners to deploy services faster with less work at a reduced cost. We were able to:
- Reduce manual work by about 60 percent.
- Migrate 10 percent of the CSE ecosystem to the public cloud (Azure IaaS).
- Decommission on-premises data centers across the pre-production ecosystem.
- Optimize about 42 percent of our global workforce.
- Save about $6.5 million in organization operational costs.
Lessons learned in Phase 1
Through this process of technological and cultural evolution, we learned that:
- Next-generation, modern applications will come from innovating in Azure. A private cloud cannot provide the innovations and scale that Azure can.
- There are a multitude of technical requirements to help our Microsoft business partners migrate to Azure.
- Tools that support the private cloud don’t scale for Azure, which significantly impacts agility.
- Processes established for a private cloud cause a fragmented and disconnected experience in Azure.
- Capability gaps to connect Azure inventory, utilization, and cost led to drastic increase in Azure operational cost.
Phase 2: Delivering value through innovation
To effectively harness the benefits of Azure, we migrated 90 percent of our IT infrastructure to Azure and then balanced the business need for innovation with efficient operation. We decided to use native cloud solutions, phase out customized IT tool sets, and decentralize and simplify operations processes as we adopt the DevOps model.
DevOps is a work model that integrates software developers and IT operations. As we move to the cloud, IT infrastructure support is drastically reduced. Going forward, we offer the most value to our business partners by adopting Infrastructure as Code to achieve friction-free interaction with engineering teams and support continuous deployment. We redefined operations roles and retrained people from traditional IT roles to be business relationship managers, engineering program managers, service engineers, and software engineers:
- Business relationship managers engage with our Microsoft business partners to understand their needs and to tailor Azure capabilities for their business needs. Business relationship managers listen, prioritize, and manage expectations across business, infrastructure, and Azure teams.
- Engineering program managers design and deliver solutions in partnership with software engineers, service engineers, and business relationship managers.
- Software and service engineers focus on developing reliable, scalable, and high quality automated services, which eliminates much manual work. As we retrained people from operational to engineering and relational skills, we saw a gradual uptick in engagement with our business partners.
Simplifying operational processes
In the past, the processes that Microsoft used to manage corporate inventory, procurement, software development, security management, financial management—and other functions—were disconnected from each other and confined within organization boundaries. And existing processes and tools resulted in long wait times for simple IT tasks.
A simple application infrastructure took at least 40 days to provision, and complex applications with multiple dependencies could take over a year. The traditional IT mindset, processes, and obsolete tools had a negative impact on software engineering productivity. IT operations processes were realigned as shown in Figure 3.
Azure radically simplified IT operations. Simple projects can be provisioned in Azure within one day, and complex projects can be provisioned in six days. We increased our speed 40-fold by eliminating, streamlining, and connecting processes, and by aligning processes for Azure.
Adopting native cloud solutions
We are retiring many customized IT tools and focusing on native cloud solutions using Azure Infrastructure as Code within the Azure Resource Manager (ARM) fabric. By using ARM templates, APIs, and PowerShell (as well as integrating developer tools) we can rapidly provision a hosting platform.
We also adopted software-defined networking (SDN) by developing APIs to dynamically procure ExpressRoute load balancing and traffic managing capabilities, which connect, secure, and route traffic and improve application responsiveness. Azure Site Recovery (ASR) is primarily used for lift-and-shift migration of virtual machines.
Azure Operations Management Suite (OMS) is a Software as a Service (SaaS)-based, cross-platform solution with capabilities that span analytics, automation, configuration, security, backup, and disaster recovery. OMS is designed for speed, flexibility, and simplicity and effectively manages windows servers and Linux in a hybrid cloud environment.
Figure 4 shows how native cloud solutions allow many traditional IT processes to become self-service.
ICM is the Incident Management System for Microsoft. With high-availability cloud support, and cloud‑based access, we now support Azure and many other services across Microsoft.
Cloud Cruiser, a third-party SaaS application, gives us valuable financial information and reports about our Azure usage and spending in near-real time. Using Cloud Cruiser, we can examine and aggregate financial data across multiple global Azure subscriptions, which is crucial. Our Azure environment contains many subscriptions—Cloud Cruiser gives us the immediate visibility that’s required to manage and control costs.
Azure Advisor is a personalized cloud consultant that helps you follow best practices to optimize your Azure deployments. It analyzes your resource configuration and usage telemetry. It then recommends solutions to help improve the performance, security, and high availability of your resources while looking for opportunities to reduce your overall Azure costs.
With much of our cloud infrastructure in place, we recognized the need to optimize our Azure resources. We created Azure Resource Optimization (ARO), a combination of tools, processes, and education to help Microsoft teams examine both their total cost of cloud resources and the number of underutilized assets. The types of underutilized resource are evaluated to identify cost savings opportunities, such as IaaS virtual machines, Azure SQL databases, PaaS web and worker roles, Azure storage, virtual networks, and IPs.
Some examples of ARO recommendations include adjusting SKU sizes, deleting unused resources, or turning off resources during downtime. The overall ARO goal is to increase awareness of consumption, optimization, and cost of Azure resources across Microsoft, to encourage engineers, managers, and leadership to adopt cost-effective behaviors. We deliver business intelligence to help people make key decisions about Azure usage, which will promote a culture of cloud optimization.
To implement our cloud-first transformation effectively and quickly, we formed engagement and program management teams to connect with our internal business partners, identify their needs, prioritize features, and deliver them with focused discipline. Individuals who can code Azure infrastructure solutions as APIs, PowerShell scripts, and templates were united as software engineering teams. And we grouped all the manageability services under service engineering teams to provide reliable, available, and supportable services.
All other IT operations support teams were decentralized and integrated into application teams using the DevOps model to improve issue resolution time. Employees learned new skills, and we hired new people with needed skills. Assessing, refining, and hiring the right talent is part of organization hygiene.
Accelerating our transformation to Azure by changing roles, investing in new skills, and simplifying operations processes had four important benefits.
More productive workforce
- CSE ecosystem is 90 percent in Azure (IaaS mostly).
- We shifted to a self-service culture.
- DevOps is in practice.
More agile business
- Provisioning speed was increased 40-fold by simplifying operations processes and using native cloud solutions.
- Customized IT tools were reduced 60 percent.
- CPU utilization increased 400 percent.
- Annual cloud spending was reduced 38 percent.
- On-premises IT datacenters and labs have been decommissioned across our production ecosystem.
Improved business partner experience
- We have improved the user experience and engagement with our business partners. We have shared practices and lessons learned across our company and industry.
Lessons learned in Phase 2
To make our digital transformation to Azure a success, we had to:
- Redesign strategic assets as Platform as a Service (PaaS) solutions.
- Integrate engineering and manageability platforms.
- Use data as a strategic asset.
- Use predictive analytics and machine learning to prevent and remediate failures.
Phase 3: Embracing the digital ecosystem
Our ability to take advantage of emerging technologies and to embrace new business strategies will be a deciding factor in the modern era. Going forward, CSE teams will be organized around end-to-end ownership of services that delight our business partners and that focus on innovation, co-creation, and collaboration.
Our first phase of transformation focused on migrating infrastructure and automating processes to drive efficiency and lower operations costs. The second phase was driven by adopting the Azure platform, simplifying operations processes, and changing operations roles to invest in engineering, customer service, and native cloud solutions.
The next stage includes developing intelligent systems on Azure to deliver reliable, scalable services and to connect operations processes across Microsoft. Bots will support basic user queries, while service reliability engineers strive to predict and remediate failures using predictive analytics and machine learning. Our focus is on operational resilience and cost avoidance. Several industry trends drive the continued evolution of our digital IT ecosystem:
- DevOps culture accelerates engineering team deliverables and decisions using a boundary-free flow of information and frictionless processes.
- Native cloud solutions offer an enterprise-level manageability platform that supports decentralized services and enables flexible, predictable, reliable response to changes with speed.
- Data has become a durable asset. With the proliferation of cloud infrastructure, mobile applications, and IOT devices there are growing needs to store massive data and analyze it in near-real time to predict patterns, build models, and drive intelligent actions among end-user communities
- Open source standards are increasingly supporting a platform for innovation, moving to the cloud, and enabling community governance at scale to balance the need for security with agility
- CSE as a services broker shifts our engineering focus from system design/build to assembly, configuration, and integration of specialized third-party software components. We can accelerate the time to value and reduce technical debt.
Figure 5 shows how our digital transformation and move to the cloud will use automation, enhanced resiliency, predictive analytics, and bots to integrate business partner feedback and improve service to our business partners.
We recognized that our business partners need hybrid cloud scale and economics by offering enterprise-level engineering and management platforms. We have embraced the industry trends of mobility, IOT, machine learning, AI, open source, and cross-platform standards. Together, Azure PaaS, Visual Studio Online, and AppInsights will enable engineers to focus on features and usability, while ARM fabric and OMS will provide a single pane of glass view to provision, manage, and decommission infrastructure resources securely. Only through optimizing the engineering and manageability process independently and in concert with each other can we achieve the digital transformation goals for Microsoft.
CSE plays an influential role in the digital transformation of the company. Our evolution and move to Azure is anchored around the idea of building connected intelligence systems to transform how we engage with business partners, empower engineers, optimize operations, and reinvent products. Delivering excellence will drive the cultural change to modern practices.
With connected systems, simplified self-service provisioning, and a focus on our business partners, we can scale our infrastructure service offerings across the company and drive innovation, business agility, and productivity. In the process, we will also reduce costs and improve our operations resilience.
For more information
Microsoft IT Showcase
© 2019 Microsoft Corporation. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.