Microsoft moves IT infrastructure management to the cloud with Microsoft Azure

Nov 3, 2023   |  

Microsoft Digital technical storiesWe’re transforming our IT infrastructure management internally here at Microsoft.

At Microsoft Digital Employee Experience (MDEE), we’re embracing our digital transformation and the culture changes that comes with it. With over 98 percent of our IT infrastructure in the cloud, we’re adopting Microsoft Azure monitoring, patching, backup, and security tools to create a customer-focused self-service management environment centered around Microsoft Azure DevOps and modern engineering principles. As we continue to benefit from the growing feature set of Azure management tools, we’ll deliver a fully automated, self-service management solution that gives us visibility over our entire IT environment.

The result?

Business groups at Microsoft will be able to adapt IT services to best fit their needs.

[Explore shining a light on how Microsoft manages Shadow IT. | Discover enabling a modern support experience at Microsoft. | Unpack creating the digital workplace at Microsoft.]

For a transcript, please view the video on YouTube: https://www.youtube.com/watch?v=C1PEhAT1Cns, select the “More actions” button (three dots icon) below the video, and then select “Show transcript.”

Microsoft experts share the processes and tools used to move our monitoring services into Azure. They discuss how we utilized solutions that use native Azure functionality to recreate certain SCOM functions and views in Azure Monitor. You will also learn how DevOps teams use log analytics to gain more visibility into end-to-end application performance.

Digital transformation at Microsoft

Our MDEE team is a global IT organization that strives to meet Microsoft business needs. Microsoft Azure is the default platform for our IT infrastructure. We host 98 percent of our IT infrastructure in the cloud. Here are a few details:

  • More than 220,000 employees
  • 150 countries
  • 587 locations
  • 1,400 Azure subscriptions
  • 1,600 Azure-based applications
  • 17,000 Azure infrastructure-as-a-service (IaaS) virtual machines
  • 643,000 managed devices

Like most IT organizations, we have our roots in the datacenter. In the past, our traditional hosting services were mostly physical, on-premises environments that consisted of servers, storage, and network devices. Most of the devices were owned and maintained for specific business functions. Technologies were very diverse and needed people with specialized skills to design, deploy, and run them. Our achievements were limited by the time required to plan and implement the infrastructure to support the business.

As technology evolved, we began to move out of the datacenter and into the cloud. Cloud-based infrastructure created new opportunities for us and has transformed the IT infrastructure we manage. We continue to grow and adapt in a constantly changing IT landscape.

Traditional IT technologies, processes, and teams

Our traditional datacenters were managed by a legion of IT pros, who supported the diverse platforms and systems that made up our infrastructure. Physical servers, and later virtual servers, numbered in the tens of thousands, spanning multiple datacenters and comprising a mass of metal and silicon to be managed and maintained. Platform technologies ranged from Windows, SQL Server, BizTalk, and SharePoint farms to third-party solutions such as SAP and other information security-related tool sets. Server virtualization evolved from Hyper-V to System Center Virtual Machine Manager and System Center Orchestrator.

To provide a stable infrastructure, we used structured frameworks, such as IT Infrastructure Library/Managed Object Format (ITIL/MOF). Policies, processes, and procedures in the framework helped to enforce and control security and availability, and to prevent failures. Microsoft product engineering groups that used hosting services had a similar adoption process for their application and service needs, which were based on ITIL/MOF.

This model worked well for traditional IT infrastructure, but things began to change when cloud computing and Microsoft Azure began to influence the IT landscape.

Evolution of the hybrid cloud

As IT infrastructure and services began to move to the cloud, the nature of the cloud and how we treat it changed. We’ve now been hosting IT services in Microsoft Azure for a long time, and as Azure has evolved and grown, so has our engagement with Azure services and the volume of our IT services hosted in Azure.

Early Azure: IT-owned, IaaS, and lift-and-shift

In the early years, Microsoft Azure was IT only. We had full control of cloud development, implementation, and management. We could create and manage solutions in Azure, but it was a siloed service.

The infrastructure consisted primarily of IaaS virtual machines that hosted workloads in the cloud the same way that they hosted workloads in on-premises datacenters.

Efficiency gains were small and infrastructure management still used the same tools—sometimes hosted in the cloud and sometimes hosted on-premises and connected to the cloud. It was very much a lift-and-shift migration from the datacenter to the cloud, and our management processes imitated the on‑premises model in much the same way.

The datacenter remained the focus, but that was changing.

Microsoft Azure evolves: PaaS, co-ownership, and cloud-first

As Microsoft Azure matured and more of our infrastructure and services moved to the cloud, we began to move away from IT‑owned applications and services. The strengths of the Azure self-service and management features meant that a business group could handle many of the duties that we offered as an IT service provider—which meant that they could build solutions that were more agile and responsive to their needs.

Microsoft Azure platform-as-a-service (PaaS) functionality matured, and the focus moved from IaaS-based solutions to PaaS-based solutions. Azure became the default target for IT solutions; datacenter decommissioning began as more solutions moved to or were created in Azure. Monitoring and management was becoming cloud-focused as we pointed more of our System Center Operations Manager (SCOM) and System Center Configuration Manager (SCCM) instances at the cloud. Azure-native management started to mature.

Large-scale Azure: Service line–owned, IT-managed, PaaS-first

PaaS quickly became a focus for developers in our business groups, as they realized the agility and scalability they could achieve with PaaS-based solutions. Those developers shifted to PaaS for applications as we transitioned away from IaaS and virtual machine-based solutions.

With the advent of Microsoft Azure Resource Manager, which permitted a broader level of user control over Microsoft Azure services, we saw service lines begin to take ownership of their solutions, and business groups started to manage their own Azure resources.

The datacenter became an inconvenient necessity for apps that couldn’t move to Microsoft Azure. We still used SCOM and SCCM as the primary monitoring and management tools, but we had moved almost all our instances into IaaS implementations in Azure. Azure-native management became a mature product, and we started to plan and deliver a completely cloud-based management environment.

Microsoft Azure in a DevOps culture: Service linemanaged, Internet-first, business-first

We’re continually nurturing a DevOps culture—DevOps has transformed the way that Microsoft Azure solutions are developed and operated. Our Azure solutions offer an end-to-end view for our business groups. They’re agile, dynamic, and data-intensive. Continuous integration and continuous development create a continual state of improvements and feature releases.

The Microsoft Azure solutions that our business groups use are designed to respond to their business needs. We actively seek and use Azure-native tools for control over and insight into IT environments, in Azure first, but also, back to the datacenter where required. We’re a long, long way past managing a stack of metal. The modern workplace is here at Microsoft, and it changes every day.

Realizing digital transformation

In the modern workplace, the developers and IT decision makers in our business groups have an increasingly critical business role. Our business groups need the autonomy to make IT decisions that serve their business needs in the best way possible. With 98 percent of our IT infrastructure in Microsoft Azure, we’re increasingly looking to the agility, scale, and manageability that Azure provides. Using this scale, we solve business needs and provide the framework for a complete IT organization, from infrastructure to development to management.

Managing the modern hybrid cloud

Our modern hybrid cloud is 98 percent Microsoft Azure—and Azure is the primary platform for infrastructure and management tools. Azure is not only the default platform for IT solutions—it is our IT solution.

Just as PC sprawl occurred in the late 1990s and server sprawl did the same thing in the 2000s, cloud sprawl is a growing reality. Implementing new cloud solutions to manage the cloud environment and the remaining on-premises infrastructure is critical for our organization. The new Cloud solutions scope includes the flexibility for our engineers to leverage PaaS, Functions, and Container models to optimize the management of Cloud Environments.

Embracing decentralized IT

Decentralized IT services are a big part of digital transformation. We need a management solution that offers us—and our business groups—what we need to manage our IT environments. We always want to maintain governance over security and compliance of Microsoft as a whole, but we also realize that decentralized IT services are the most suitable model for a cloud-first organization.

By decentralizing services and ownership in Microsoft Azure, we offer our business groups several benefits:

  • Greater DevOps flexibility.
  • A native cloud experience: subscription owners can use features as soon as they’re available.
  • Freedom to choose from marketplace solutions.
  • Minimal subscription limit issues.
  • Greater control over groups and permissions.
  • Greater control over Microsoft Azure provisioning and subscriptions.
  • Business group ownership of billing and capacity management.

Our goal in the management of modern hybrid cloud continues to be a solution that transforms IT tasks into self-service native cloud solutions for monitoring, management, backup, and security across our entire environment. With this solution, our business groups and service lines have reliable, standardized management tools, and we can maintain control over and visibility into security and compliance for our entire organization.

The areas where we retain oversight include:

  • General IT and operational policy implementation, as approved by the subscription owner. Areas include compliance, operations, and incident management.
  • Shared network connectivity over Microsoft Azure ExpressRoute, as needed.
  • Visibility into infrastructure inefficiencies and self-service tool development.

Our management solution must be as agile as the solutions we manage, and we provide best practices, standards, and consulting for Microsoft Azure management solutions to ensure that our business groups are getting the most out of the platform.

Supporting digital transformation with Microsoft Azure management tools

Managing the hybrid cloud in Microsoft Azure encompasses a wide range of services and activities. For our business groups to improve, they need to monitor their apps and solutions to recognize issues and opportunities. They need a patching and management solution that keeps systems up to date, manages configuration, and automates common maintenance tasks.

We must protect data with a disaster recovery platform and ensure security and compliance for business groups and the entire company. We use the management tools in Microsoft Azure to enable hybrid cloud management.

Monitoring the hybrid cloud

Monitoring is an essential task for our business groups and their service lines. They need to understand how their apps are performing (or not performing) and have insight into their environment. We’ve used SCOM for monitoring at Microsoft for more than 10 years—and a certain rhythm develops when you use a product for that long.

To ease the transition from SCOM to Microsoft Azure monitoring, we developed transition solutions that use native Azure functionality to recreate certain SCOM functions and views in Microsoft Azure Monitor.

The transition solutions consist primarily of PowerShell scripts and documentation. They give our business groups a familiar environment to work in while they become familiar with Microsoft Azure monitoring.

Our business groups can also start in a standardized environment with our built-in tested security and compliance components. This helps us maintain a centralized standard while allowing for decentralized monitoring. We maintain metrics for critical organizational services, but we leave operational monitoring to each business group.

Our Microsoft Azure monitoring is designed to:

  • Create visibility. We’re providing instant access to a foundation set of metrics, alerts, and notifications across core Azure services for all business units. Microsoft Azure Monitoring also covers production and non-production environments, as well as native monitoring support across Microsoft Azure DevOps.
  • Provide insight. Business groups and service lines can view rich analytics and diagnostics across applications, as well as compute, storage, and network resources, including anomaly detection and proactive alerting.
  • Enable optimization. Monitoring results help our business groups and service lines understand how users are engaging with their applications, identify sticking points, develop cohorts, and optimize the business impact of their solutions.
  • Deliver Extensibility. Designed for extensibility to enable support for custom event ingestion, and broader analytics scenarios.

We’ve now retired our SCOM environment, leaving Microsoft Azure monitoring as the default for both cloud and on-premises monitoring now focusing on:

  • Automated installation and repair of the Microsoft Monitoring Agent using Microsoft Azure Runbooks.
  • Centralized visibility into comprehensive health and performance.
  • Fully featured transition solution development to enable complete self-service monitoring in Microsoft Azure.
  • Complete transition from SCOM to Microsoft Azure.

Patching, updating, and inventory management

As we’ve done for monitoring, we’re using transition solutions to make it easier for business groups to transition from previously used on-premises tools to Microsoft Azure.

Our patching processes depended on our preexisting solutions as we worked through the transition to Microsoft Azure. SCCM and associated agents provided the bulk of our patching, software distribution, and management process, but we’ve moved to Azure in a phased approach as our Azure subscriptions become ready to transition to Azure for management.

We’ve built transition solutions for our business groups to help them transition from the SCCM platform and other legacy tools to the Microsoft Azure update management patching service. We’re maintaining and modifying these transition solutions as Azure features replaced the on-premises functionality.

From a patching and management perspective, we’re focusing on:

  • The transition of inventory management from Configuration Manager to Microsoft Azure, including discovery, tracking, and management of IT assets.
  • Transition of our update processes to Microsoft Azure Update Management for business groups.
  • Enabling self-service patch management. We’re developing an orchestrated deployment of operating system and application updates with Microsoft Azure, including centralized compliance reporting.
  • Creating and updating solutions to support the transition of the above areas, including Resource Manager templates, PowerShell scripts, documentation, and Microsoft Azure Desired State Configuration.

The design for patching and management, as with monitoring, is to provide an Microsoft Azure-based self-service solution for our business groups that gives them control over their patching and management environment while giving us the ability to centrally monitor for compliance and security purposes.

Ensuring recoverable data

With Microsoft Azure as the primary repository for business data, it’s extremely important to have an Azure backup solution with which our business groups and service lines can safeguard, retain, and recover their data.

Our data recovery solutions address the following major areas of concern:

  • Recover business data from attacks by malicious software or malicious activity.
  • Recover from accidental deletion or data corruption.
  • Secure critical business data.
  • Maintain compliance standards.
  • Provide historical data recovery requirements for legal purposes.

Our Microsoft Azure data footprint is immense. We currently host 1.5 petabytes of raw data in Azure and use almost nine petabytes of storage to back up that data.

We’re using Microsoft Azure Backup as a self-service solution. It gives business groups more control over how they perform their backups and gives them responsibility for backing up their business data—because each business group knows its data better than anyone else.

We’re also using Microsoft Azure Backup for virtual machine-level backup, and we’re backing up some on-premises data to Microsoft Azure using Microsoft Azure Recovery Services vaults. We’ve created a packaged solution for backup management in Azure that consists of scripts and documentation—our business groups can use it to migrate to Azure Backup quickly and efficiently.

As with other areas of enterprise management, we’re evaluating new features for Microsoft Azure Backup that will offer more backup capabilities to our business groups.

Embedding security and compliance

Decentralization gets the greatest scrutiny when it comes to security and compliance. We’re responsible for security and legal compliance for the organization, so our security controls are the most centralized of all the cloud management solutions we implement. However, centralization does not directly affect day-to-day solution management for our business groups and their service lines.

We leveraged a broad set of security and compliance practices and tools that are generally applied across all Microsoft Azure subscriptions. The following imperatives govern the general application of security and compliance measures:

  • Microsoft Azure Policy. Using Azure Policy, we establish guardrails in subscriptions that keep our service engineers within governance boundaries automatically. Policy can help control a myriad of settings by default, including limiting the network configurations to safe patterns, controlling the regions and types of Microsoft Azure resources available for use, and ensuring data is stored with encryption enabled.
  • Automation gives us a chance to keep pace with the constantly changing cloud environment. DevOps is heavily centered on end-to-end automation, and we need to complement DevOps automation with automated security. Automated security saves significant time and cost for apps that are frequently updated, and we can quickly and consistently configure and deploy security.
  • Empower engineering teams. In an environment where change is constant, we want to empower our engineering teams to make meaningful, consistent changes without waiting for a central security team to approve an app. Our engineers need the ability to integrate security into the DevOps workflow. They don’t have to take extra measures to be secure, nor do they need to wait for a central security team to approve an app.
  • Maintain continuous assurance. When development and deployment are continuous, everything that goes with them needs to follow suit—including security assurance. The old requirements for sign-offs or compliance checks create tension in the modern engineering environment. We want to define a security state and track drift from that state to maintain a consistent level of security assurance across the entire environment. This helps ensure that builds and deployments that are secure when they’re delivered stay secure from one release iteration to the next and beyond.
  • Set up operational hygiene. We need a clear view of our DevOps environment to ensure operational hygiene. In addition to understanding operational risks in the cloud, DevOps operational hygiene in the cloud requires a different perspective. We need to create the ability to see the security state across DevOps stages and establish capabilities to receive security alerts and reminders for important periodic activities.

Key Takeaways

At MDEE, our goal is a completely cloud-based, self-service management solution that gives our business groups concise control over their environments using Microsoft Azure tools and features. We’ll continue to offer updated Azure-based solutions, transitioning away from on-premises, System Center–based management.

As we continue to transition business groups to cloud-based monitoring, we’re growing our feature set and making our Microsoft Azure-based management even better. We envision a near future where our management systems will be completely cloud based, decentralized, and automated—and our organization continuing to build our business in Azure.

Related links

We'd like to hear from you!
Please share your feedback with us—take our survey and let us know what kind of content is most useful to you.

Tags: , ,