Microsoft uses SAP enterprise resource management software to run mission-critical business functions like finance and human resources. In an on-premises model, physical computing resources are costly and may go unused. But by moving our SAP systems to Microsoft Azure, Microsoft avoids maintaining unused resources—we scale our systems up and down for current and short-term needs. We’ve fine-tuned our capacity management processes for lower costs, and more agility, scalability, and flexibility.

EXPLORE RELATED CONTENT

SAP is the backbone of our digital transformation. Like many enterprises, Microsoft uses SAP—the enterprise resource planning (ERP) software solution—to run most of our business operations. At Microsoft, we’re running SAP on Azure, the preferred platform for SAP and the optimal platform for digital transformation. 

We’ve optimized our SAP on Azure environment to gain business and operational benefits that make our SAP environment agile, efficient, and able to grow and change with our business. Optimizing our Azure environment has allowed us to:

  • Increase cost savings by using our Azure infrastructure more efficiently. 
  • Create a more agile, scalable, and flexible SAP on Azure solution. 

As of February 2018, Microsoft’s instance of SAP is 100 percent migrated to Azure. By optimizing SAP on Azure, we’re positioning our SAP environment to grow and change with our business needs. Additionally, we’re positioned to lead our digital transformation and empower everyone in our organization to achieve more. Azure makes SAP better.

SAP at Microsoft

Microsoft Digital the organization that is powering, protecting and transforming Microsoft manages the SAP systems and applications (apps) for mission-critical business functions like Finance, Human Resources, Supply Chain management, and others. SAP on Azure is a viable and trusted path to innovation in the cloud. The solution provides an agile infrastructure, minimizing downtime, risks, and costs, and improving employee efficiency to power the digital transformation.

Each SAP system or app in the overall SAP landscape uses servers and hardware, computing resources (like CPU and memory), and storage resources. Each system also has separate environments, like sandbox and production. The resources required to run SAP can be costly in an on-premises model, where you have physical or virtualized servers that often go unused. 

Consider a typical on-premises system. The IT industry often sizes on-premises servers and storage infrastructure for the next three to five years, based on the expected maximum utilization and workload during the life span of an asset. But often, the full capacity of the hardware isn’t used outside of peak periods—or isn’t needed at all. Maintaining these on-premises systems is costly. 

With Azure, we avoid infrastructure underutilization and overprovisioning. We quickly and easily scale up and scale down our SAP systems for current and short-term needs, not for maximum load over the next three to five years. 

Capacity management decreases our costs and increases our agility

By managing capacity and sizing our SAP systems on Azure, we’ve experienced improvements in several areas: 

  • We have a lower total cost of ownership—we only pay for what we need, when we need it. We save on costs of unused hardware and ongoing server maintenance. 
  • We cut the core counts (number of CPUs in a machine) nearly in half—from 64-core physical machines to 32-core virtual machines for almost every server that we moved.
  • We are much more agile. We size for our needs now, and easily add or change our setup as needed to accommodate new functionality. For example, in a few minutes, we changed an 8-CPU virtual machine to 16 CPUs, doubled the memory, and added Azure premium storage to meet our short-term needs. Later, to save costs, we easily reverted to the original setup.

What does optimizing involve?

Optimizing involves calculating our hardware requirements like CPU resources, storage space, memory, input/output (I/O), and network bandwidth. When we optimize, we size for today. We assess our infrastructure, resources, and costs, and then size our systems as small as possible. We also ensure that sufficient space to run business processes without causing performance issues during expected events like product releases or quarterly financial reporting. This capability provides Microsoft the ability to optimize our storage and computing power, giving us flexibility and on-demand agility. 

Tips for sizing

Sizing is an ongoing task because the load, business requirements, and behavior patterns can change at any time. Following are some considerations and tips, based on the process that we use: 

  • Design for easy scale-up and scale-out. Upsize only when needed, rather than scaling up or out months or quarters ahead of an actual business need. Start with the smallest possible computing capacity. It’s easy to add capacity later and resize before business processes change or before new processes go live in the environment. Autoscaling up and out brings additional benefits because it’s an automatic response to current conditions and usage patterns. Designing for easy scale-up also includes configuring the SAP HANA database and the SAP instances to dynamically adjust the amount of memory used or number of work processes depending on available resources in the virtual machine (VM). Keep the instance design simple: One SAP instance per VM. 
  • Figure out how many virtual machines a system needs. Our production and user acceptance testing (UAT) systems have multiple virtual machines, but for sandbox and quality assurance, we usually allocate single virtual machines. Sometimes our SAP app instances and database instance are on the same virtual machine. 
  • Don’t size for only CPUs and memory. Size for storage I/O and throughput requirements, too. 
  • Consider upstream and downstream dependencies in data movement and in app-to-app communication. Let’s say that you move an app into a public cloud. Adding 20 to 40 milliseconds in communication between on-premises and public cloud apps can affect dependencies and can also affect customers or your service-level agreements (SLAs) with business partners. 
  • Decide whether all your systems need Azure premium storage. It’s possible to change storage via a short downtime from Azure standard storage to premium storage without the need to manually copy the data. Here is an example of how we’re using this feature: For our archiving system, we used Azure Standard Storage and small virtual machines. When we loaded data into the system, we temporarily doubled the memory and CPU, and added Azure premium storage for log file drives in the database. To save costs, after we loaded the data, we made the virtual machine smaller again and removed the premium storage drives. 
  • Decide if all apps must run continuously. Can some apps run eight hours a day, Monday through Friday? If so, you can save costs by snoozing. At night, or on weekends or holidays, you can often snooze development and sandbox systems if they aren’t in use. Also consider identifying test systems and potentially even some production systems for snoozing. Create a snooze schedule and provide key personnel with the ability to un-snooze systems on demand. If you have a separate business-continuity system and you snooze hardware for it, you pay only for storage, not for compute consumption. Also consider using the smallest size feasible. If there’s a disaster, resize to a bigger size before you start the production system, such as the database server, for business continuity. 
  • Keep monitoring and managing system and resource capacity. Make changes before issues occur. Monitor storage use, growth rates, CPU, network utilization, and memory resources that are used on virtual machines. Again, consider autoscaling up and out. If monitoring indicates that a system is consistently oversized, then adjust downward. 

Two common strategies for sizing SAP systems

We used two common strategies for sizing SAP systems, each in a different way and at different points in the optimization process. We used the SAP Quick Sizer at the start of our optimization process because it had a simple, web-based interface and it allowed us to prepare our sizing strategies from the start. We used reference sizing later, after we determined some context around our virtual-machine sizing and could provide virtual machines in Azure for reference. 

SAP Quick Sizer

If you don’t yet have systems or workloads in Azure, start with SAP Quick Sizer—it’s an online app that guides you on sizing requirements, based on your business needs. Quick Sizer is helpful for capacity and budget planning. The app has a questionnaire where you indicate the number of SAP users, how many transactions you expect, and other details. The SAP system recommends a number for the SAP Application Performance Standard (SAPS), a measurement of processing requirements that you need, such as for a database server. 

If the recommended number is 80,000, you need to leverage servers with SAPS that add up to 80,000. 

You can find more information about SAPS for Azure virtual machines in SAP Note #1928533 SAP Applications on Azure: Supported Products and Azure VM types (SAP logon required). 

You should keep a few considerations in mind when you’re using SAP Quick Sizer. There can be customization and variations of SAP systems, depending on business processes, which could change system behavior. Or you might have capabilities enabled for new SAP deployments or custom code for which no Quick Sizer exists. Also, in the past, hardware vendors guided customers on the servers that they needed and how to install them. With Azure, customers make their own decisions—for example, how to grow storage as data volume grows, or how to adjust CPU compute resources.

Reference sizing

After systems are live in Azure, reference sizing is the recommended method. With this approach, you need to look at the performance of systems you’ve already moved to Azure that have a similar load to the systems that you want to move. This comparison helps you estimate your sizing requirements accurately. For example, if you have an on-premises system that you want to move to Azure, and it’s three times larger than one of the systems that you already have on Azure, adjust the sizing based on systems you’ve already deployed in Azure, and then deploy the new system.

If it turns out that your estimate wasn’t accurate, it’s much easier and quicker to adjust CPU and memory resources in Azure than on-premises, by switching to a different virtual machine size. Adjusting the database on-premises is more difficult because you might need to buy servers with more CPU and memory. For on-premises, you must look at what you have, add a buffer, and consider the additional load that you’ll have in the next few years.

Some key technical considerations

When we integrate SAP with Windows Server and SQL Server, our main considerations are cost of ownership and low complexity. When you plan your integration and reference architecture, make sure that the technical landscape is easy and cost effective. With business-critical systems, it’s difficult to scale when you have an architecture whose maintenance requires highly skilled individuals, or when there are emergencies where you need business continuity.

For easy administration and operations, we use the same app design in all SAP production systems. We only adjust VM sizes and numbers based on the system-specific requirements.

Also, to avoid issues for customers who run SAP workloads on Azure, Microsoft certifies only certain Azure VM types. These VMs must meet memory, CPU, and ratio requirements, and they must support defined throughputs. To learn more about Azure VM types certified for SAP, review SAP certifications and configurations running on Microsoft Azure.

Technical implementation and technical capabilities

Figure 1 shows the Microsoft SAP ERP/ECC production system in Azure. By moving to Azure, we’ve gained agility and scalability on the SAP Application layer. We can scale the SAP Application layer up and down by increasing and decreasing the size and number of the VMs. The design and architecture have high-availability measures against single points of failure. So, if we need to update Windows Server or Microsoft SQL Server, perform infrastructure maintenance, or make other system changes, it doesn’t require much, if any, downtime. We implement infrastructure in Azure for our production systems with standard SAP, SQL Server, SAP HANA, Windows Server, and SUSE Linux high-availability features.

Illustration of the current Microsoft SAP EP/ECC production system in Azure with an example of how we use Azure Availability Zones for our VMs.

Figure 1. Current SAP BW/HANA production system in Azure

High availability and scalability

To ensure high availability, we are leveraging Azure Availability Zones: We distribute our VMs in multiple zones. Figure 1 includes an example of how we use Azure Availability Zones). If a problem arises in one zone, the system is still available.

All single points of failure are secured with clustering: Windows Server Failover Cluster for the Windows operating system and Pacemaker cluster for SUSE Linux. For databases, we use SQL Server Always On and SAP HANA System Replication (HSR). The databases are configured for a synchronous commit on both local HA nodes (no data loss occurs, and automatic failover is possible) and an asynchronous commit to the remote disaster recovery node. If an issue arises with the main database server, SAP will automatically reconnect to the local high availability node.

Because we can use the secondary database, we can upgrade software and SQL Server, roll back to previous releases, and do automatic failovers with no or minimal risk. 

For scalability and high availability of the SAP application layer, multiple SAP app server instances are assigned to SAP redundancy features such as logon groups and batch server groups. Those app server instances are configured on different Azure virtual machines to ensure high availability. SAP automatically dispatches the workload to multiple app-server instances per the group definitions. If an app server instance isn’t available, business processes can still run via other SAP app server instances that are part of the same group.

Rolling maintenance

The scale-out logic of SAP app server instances is also used for rolling maintenance. We remove one virtual machine, and the SAP app server instances running on it, from the SAP system without affecting production. After we finish our work, we add back the virtual machine, and the SAP system automatically uses the instances again. If high load occurs and we need to scale out, we add additional virtual machines to our SAP systems.

Automated shutdown of SAP systems

To ensure that VM costs are managed, we’ve established new policies for nonproduction SAP systems: Default setting for a system is that’s unavailable. Users can start systems on demand by using a snooze application build on top of Microsoft Power Apps. The system will be available for 12 hours and then automatically shut down again unless the availability time was extended. Additionally, systems that are used regularly are assigned to a fixed availability schedule. For example, the schedule might be: Systems will be shut down Friday evening and started again on Monday morning without user interaction. In the event that the system is needed over the weekend, users can start the system by using the snooze application.

Telemetry and monitoring using Azure Monitor

Moving SAP systems to Azure enabled easy integration with various Azure Monitor services for enhanced telemetry and monitoring. We approached telemetry by using a multilayer concept: 1) SAP Business Process layer, 2) SAP Application Foundational layer, 3) SAP Infrastructure layer, and 4) SAP Surrounding API layer.

Azure Log Analytics offers many standard metrics for the SAP Infrastructure layer, and it’s easy to integrate with SAP to collect custom metrics from the Application Foundational layer.

Implementing telemetry for the SAP business process and API layers is relatively challenging because doing so requires custom development. We use tools in the ABAP SDK to integrate and export data from SAP to Azure Monitor Application Insights.

The following Microsoft Inside Track articles provide more details about how we’ve set up technical and business-process telemetry.

Lessons learned

We keep learning and iterating as we optimize SAP for Azure in our environment. Here are some important lessons that we’ve learned: 

  • Ensure that you don’t over-provision your virtual machines, but make sure that you provision sufficient resources to avoid having to keep increasing your system resources weekly. 
  • Design and build your infrastructure and storage in Azure so that it can scale. Even for our development and test systems, we decided to use Azure premium storage because it offers low latency. That approach is optimal, because during project implementation, there are often multiple developers simultaneously using the development systems.
  • The types of virtual machine storage and Azure networking that we use are influenced by the lessons that we’ve learned about functionality. The Azure cloud platform is continually improved based on customer feedback and requirements.
  • Design for high availability in your production systems by using Windows Server Failover Clustering, SQL Server Always On, and SAP features like logon groups, remote function call groups, and batch server groups. 

Looking ahead

We’re excited about the decreased costs and increased agility that we’ve experienced so far in optimizing SAP for Azure. In the future, we plan to share more lessons that we learn as we move forward with post-migration improvements. For information on designing a migration initiative, review Strategies for migrating SAP systems to Microsoft Azure. Our future plans include:

  • Automating the sizing of our simpler systems and environments and developing autoscale. Automation and autoscale apply more to the middle tier—the SAP application layer—but we’d also like to autoscale up and down for the database layer and file servers. We want our systems to autoscale based on current conditions.
  • Adding more automation for business continuity. Right now, we use the same semiautomated business-continuity process in Azure that we used on-premises. If there’s a disaster, production fails over to a different Azure region. 
  • Exploring new business-continuity strategies and technology options as they apply to Azure.
  • Helping our customers that have SAP scenarios like Azure Backup or Azure Data Encryption at rest address questions such as:
    • Which policies do I apply in the SAP landscape? 
    • What do I encrypt? Do I use disk encryption or database encryption? 
    • Do I need the same backup methods for a 50-gigabyte database that I require for a 10-terabyte database?
  • Add and use new Azure capabilities. We want to enable more SAP scenarios to run in Azure—better and faster storage, larger virtual machines, better network connectivity, and more Azure operational guidance.

You might also be interested in

Deploying Kanban at Microsoft leads to engineering excellence
May 18, 2022

Deploying Kanban at Microsoft leads to engineering excellence

Read blog
Responding to site outages at Microsoft with machine learning and AI
May 12, 2022

Responding to site outages at Microsoft with machine learning and AI

Read blog
Listening to millions of IoT building sensor readings with load testing
May 03, 2022

Listening to millions of IoT building sensor readings with load testing

Read blog
Dining transformation at Microsoft eases the transition as employees return to work
April 27, 2022

Dining transformation at Microsoft eases the transition as employees return to work

Read blog