Microsoft Digital has reduced Azure costs by adopting data-driven, cost-optimization techniques; investing in central governance; and driving modernization efforts across all Azure workloads.

EXPLORE RELATED CONTENT
Microsoft Digital articles.

Microsoft Digital Employee Experience is aggressively pursuing cost optimization in Microsoft Azure as part of the continuing effort to improve the efficiency and effectiveness of the enterprise Azure environment at Microsoft and for Microsoft customers. By adopting data-driven cost-optimization techniques, investing in central governance, and driving modernization efforts throughout the Azure environment, Microsoft Digital is ensuring that the largest enterprise environment hosted in Azure is also the most cost efficient and a blueprint for our Azure customers.

Introduction

Microsoft Digital began its digital transformation journey in 2014 with the bold decision to migrate on-premises infrastructure to Azure to capture the benefits of a cloud-based platform—agility, elasticity, and scalability. Since then, our teams have progressively migrated and transformed our IT footprint to the largest cloud-based infrastructure in the world—we host more that 95 percent of our IT resources in Microsoft Azure.

The Azure platform has expanded over the years with the addition of hundreds of services, dozens of regions, and innumerable improvements and new features. In tandem, we've increased our investment in Azure as our core destination for business solutions at Microsoft. As our Azure footprint has grown, so has the environment's complexity, requiring us to optimize and control our Azure expenditures.

Optimizing cost in Azure at Microsoft Digital

The Microsoft Digital footprint follows the resource usage of a typical large-scale enterprise. In the past few years, our cost-optimization efforts have been more targeted as we attempted to minimize the rising total cost of ownership in Azure due to several factors, including increased migrations from on-premises and business growth. This focus on optimization instigated an investment in tools and data insights for cost optimization in Azure.

The built-in tools and data that Azure provides form the core of our cost-optimization toolset. We derive all our cost- optimization tools and insights from data in Azure Advisor, Azure Cost Management and Billing, and Azure Monitor. We’ve also implemented design optimizations based on modern Azure resource offerings. We extract recommendations from Azure Advisor across the different Azure service categories and push those recommendations into our IT service management system, where the services' owners can track and manage the implementation of recommendations for their services.

Understanding holistic optimization

As the first and largest adopter of Azure, we’ve developed best practices for engineering and maintenance in Azure that support not only cost optimization but also a comprehensive approach to capturing the benefits of cloud computing in Azure. Microsoft developed and refined the Microsoft Well-Architected Framework as a set of guiding tenets for Azure workload modernization and a standard for modern engineering in Azure. Cost optimization is one of five components in the Well-Architected Framework that work together to support an efficient and effective Azure footprint. The other pillars include reliability, security, operational excellence, and performance efficiency. Cost optimization in Azure isn't only about reducing spending. In Azure’s pay-for-what-you-use model, using only the resources we need when we need them, in the most efficient way possible, is the critical first step toward optimization.

Optimization through modernization

Reducing our dependency on legacy application architecture and technology was an important part of our first efforts in cost optimization. We migrated many of our workloads from on-premises to Azure by using a lift-and-shift method: imaging servers or virtual machines exactly as they existed in the datacenter and migrating those images into virtual machines hosted in Azure. Moving forward, we’ve focused on transitioning those infrastructure as a service (IaaS) based workloads to platform as service (PaaS) components in Azure to modernize the infrastructure on which our solutions run.

Focus areas for optimization

We’ve maintained several focus areas for optimization. Ensuring the correct sizing for IaaS virtual machines was critical early in our Azure adoption journey, when those machines accounted for a sizable portion of our Azure resources. We currently operate at a ratio of 80 percent PaaS to 20 percent IaaS, and to achieve this ratio we've migrated workloads from IaaS to PaaS wherever feasible. This means transitioning away from workloads hosted within virtual machines and moving toward more modular services such as Azure App Service, Azure Functions, Azure Kubernetes Service, Azure SQL, Azure Cosmos database. PaaS services like these offer better native optimization capabilities in Azure than virtual machines, such as automatic scaling and broader service integration. As the number of PaaS services has increased, automating scalability and elasticity across PaaS services has been a large part of our cost-optimization process. Data storage and distribution has been another primary focus area as we modify scaling, size, and data retention configuration for Azure Storage, Azure SQL, Azure Cosmos DB, Azure Data Lake, and other Azure storage-based services.

Implementing practical cost optimization

While Azure Advisor provides most recommendations at the individual service level—Azure Virtual Machines, for example—implementing these recommendations often takes place at the application or solution level. Application owners implement, manage, and monitor recommendations to ensure continued operation, account for dependencies, and keep the responsibility for business operations within the appropriate business group at Microsoft.

For example, we performed a lift-and-shift migration of our on-premises virtual lab services into Azure. The resulting Azure environment used IaaS-based Azure virtual machines configured with nested virtualization. The initial scale was manageable using the nested virtualization model. However, the Azure-based solution was more convenient for hosting workloads than the on-premises solution, so adoption began to increase exponentially, which made management of the IaaS-based solution more difficult. To address these challenges, the engineering team responsible for the virtual lab environment re-architected the nested virtual machine design to incorporate a PaaS model using microservices and Azure-native capabilities. This design made the virtual lab environment more easily scalable, efficient, and resilient. The re-architecture addressed the functional challenges of the IaaS-based solution and reduced Azure costs for the virtual lab by more than 50 percent.

In another example, an application used Azure Functions with the Premium App Service Plan tier to account for long-running functions that wouldn’t run properly without the extended execution time enabled by the Premium tier. The engineering team converted the logic in the Function Apps to use Durable Functions, an Azure Functions extension, and more efficient function-chaining patterns. This reduced execution time to less than 10 minutes, which allowed the team to switch the Function Apps to the Consumption tier, reducing cost by 82 percent.

Governance

To ensure effective identification and implementation of recommendations, governance in cost optimization is critical for our applications and the Azure services that those applications use. Our governance model provides centralized control and coordination for all cost-optimization efforts. Our model consists of several important components, including:

  • Azure Advisor recommendations and automation. Advisor cost management recommendations serve as the basis for our optimization efforts. We channel Advisor recommendations into our IT service management and Azure DevOps environment to better track how we implement recommendations and ensure effective optimization.
  • Tailored cost insights. We’ve developed dashboards to identify the costliest applications and business groups and identify opportunities for optimization. The data that these dashboards provide help empower engineering leaders to observe and track important Azure cost components in their service hierarchy to ensure that optimization is effective.
  • Improved Azure budget management. We perform our Azure budget planning by using a bottom-up approach that involves our finance and engineering teams. Open communication and transparency in planning are important, and we track forecasts for the year alongside actual spending to date to enable accurate adjustments to spending estimates and closely track our budget targets. Relevant and easily accessible spending data helps us identify trend-based anomalies to control unintentional spending that can happen when resources are scaled or allocated unnecessarily in complex environments.

Implementing a governance solution has enabled us to realize considerable savings by making a simple change to Azure resources across our entire footprint. For example, we implemented a recommendation to convert Azure SQL Database instances from the Standard database transaction unit (DTU) based tier to the General Purpose Serverless tier by using a simple Azure Resource Manager template and the auto-pause capability. The configuration change reduced costs by 97 percent.

Benefits

Ongoing optimization in Azure has enabled Microsoft Digital to capture the value of Azure to help increase revenue and grow our business. Our yearly budget for Azure has remained almost static since 2014, when we hosted most of our IT resources in on-premises datacenters. Over that period, Microsoft has grown by more than 20 percent,

Our recent optimization efforts have resulted in significantly reduced spending across numerous Azure services. Examples, in addition to those already mentioned, include:

  • Right-sizing Azure virtual machines. We generated more than 300 recommendations for VM size changes to increase cost efficiency. These recommendations included switching to burstable virtual machine sizes and accounted for a 15 percent cost savings.
  • Moving virtual machines to latest generation of virtual machine sizes. Moving from older D-series and E-series VM sizes to their current counterparts generated more almost 2,500 recommendations and a cost savings of approximately 30 percent.
  • Implementing Azure Data Explorer recommendations. More than 200 recommendations were made for Azure Data Explorer optimization, resulting in significant savings.
  • Incorporating Cosmos DB recommendations. More than 170 Cosmos DB recommendations reduced cost by 11 percent.
  • Implementing Azure Data Lake recommendations. More than 30 Azure Data Lake recommendations combined to reduce costs by approximately 15 percent.

Lessons learned

Cost optimization in Azure can be a complicated process that requires significant effort from several parts of the enterprise. The following are some the most important lessons that we’ve taken from our cost-optimization journey:

Implement central governance with local accountability

We implemented a central audit of our Azure cost-optimization efforts to help improve our Azure budget-management processes. This audit enabled us to identify gaps in our methods and make the necessary engineering changes to address those gaps. Our centralized governance model includes weekly and monthly leadership team reviews of our optimization efforts. These meetings allow us to align our efforts with business priorities and assess the impact across the organization. The service owner still owns and is accountable for their optimization effort.

Use a data-driven approach

Using optimization-relevant metrics and monitoring from Azure Monitor is critical to fully understanding the necessity and impact of optimization across services and business groups. Accurate and current data is the basis for making timely optimization decisions that provide the largest cost savings possible and prevent unnecessary spending.

Be proactive

Real-time data and effective cost optimization enable proactive cost-management practices. Cost-management recommendations provide no financial benefit until they're implemented. Getting from recommendation to implementation as quickly as possible while maintaining governance over the process is the key to maximizing cost-optimization benefits.

Adopt modern engineering practices

Cost optimization is one of the five components of the Microsoft Azure Well-Architected Framework, and each pillar functions best when supported by proper implementation of the other four. Adopting modern engineering practices that support reliability, security, operational excellence, and performance efficiency will help to enable better cost optimization in Azure. This includes using modern virtual machine sizes where virtual machines are needed and architecting for Azure PaaS components such as Azure Functions, Azure SQL, and Azure Kubernetes Service when virtual machines aren't required. Staying aware of new Azure services and changes to existing functionality will also help you recognize cost-optimization opportunities as soon as possible.

Looking forward

As we continue our journey, we’re focusing on refining our efforts and identifying new opportunities for further cost optimization in Azure. The continued modernization of our applications and solutions is central to reducing cost across our Azure footprint. We’re working toward ensuring that we're using the optimal Azure services for our solutions and building automated scalability into every element of our Azure environment. Using serverless and containerized workloads is an ongoing effort as we reduce our investment in the IaaS components that currently support some of our legacy technologies.

We’re also improving our methods for decentralizing optimization recommendations to enable our engineers and application owners to make the best choices for their environments while still adhering to central governance and standards. This includes automating the detection of anomalous behavior in Azure billing by using service-wide telemetry and logging, data-driven alerts, root-cause identification, and prescriptive guidance for optimization.

Azure optimization is a continuous cycle. As we further refine our optimization efforts, we learn from what we’ve done in the past to improve what we’ll do in the future. The Microsoft Digital Azure footprint will continue to grow in the years ahead, and our cost-optimization efforts will expand accordingly to ensure that our business is capturing every benefit that the Azure platform provides.


You might also be interested in

Upgrading Microsoft’s core Human Resources system with SAP SuccessFactors
September 09, 2022

Upgrading Microsoft’s core Human Resources system with SAP SuccessFactors

Read blog
Simplifying compliance evidence management with Microsoft Azure confidential ledger
August 25, 2022

Simplifying compliance evidence management with Microsoft Azure confidential ledger

Read blog
‘Got a question?’ Boosting employee engagement at Microsoft with Dynamics 365 and Power Platform
August 23, 2022

‘Got a question?’ Boosting employee engagement at Microsoft with Dynamics 365 and Power Platform

Read blog
Helping Microsoft employees understand their value with the Total Rewards Portal
August 15, 2022

Helping Microsoft employees understand their value with the Total Rewards Portal

Read blog