Microsoft Core Services Engineering and Operations (CSEO) created a monitoring platform using Microsoft Azure telemetry tools to provide insight for business processes that flow through one of the largest SAP instances in the world. The new platform helps Microsoft keep key business users informed of business-process flows. It also provides leadership with a comprehensive view of business-process health, and allows our engineering teams to create a more robust and efficient SAP environment.
Examining SAP at Microsoft
Like many enterprises, Microsoft uses SAP—the global enterprise resource planning (ERP) software solution—to run various business operations. Our SAP environment is critical to our business performance and we integrate it into most of our business processes. SAP offers functionality for enterprise services at Microsoft, such as human resources, finance, supply-chain management, and commerce. We use a wide variety of SAP applications, including:
- Enterprise Resource Planning (ERP)
- ERP Central Component (ECC)
- Global Trade Screening (GTS)
- Governance Risk and Control (GRC)
- Supply Chain Management (SCM)
- Revenue Management and Contract Accounting (RMCA)
- OEM Services (OER)
- Master Data Governance (MDG)
SAP provides an agile infrastructure, minimizing downtime, risks, and costs, and it improves employee efficiencies to power our digital transformation. Our enterprise SAP environment exists on a large scale. The size and scope statistics for ERP/ECC (our largest SAP application) include:
- 17 terabytes (TB) of highly compressed database storage (50 TB uncompressed)
- 600 virtual servers
- 110,000 internal users, and 8,000 named accounts
- 100 percent growth in the past two years
- 9 million dialog steps per day
- 300 million transactions per month
- 300,000 monitored batch jobs per month
- 0.4 seconds average user response time
- 99.998 percent infrastructure availability
SAP on Azure
We host 100 percent of our SAP environment in Azure. Running SAP on Azure allows us to leverage the breadth of Azure functionality and integrated SAP features as our business grows and changes. Additionally, Azure helps us combat infrastructure underutilization and overprovisioning, and allows us to quickly and easily scale up and scale down our SAP systems to meet our immediate business needs.
Migration to Azure
We completed our SAP migration to Azure in early 2018, which entailed moving all of our SAP assets to more than 600 Azure virtual machines and a host of cloud services. We approached the migration using both vertical and horizontal strategies. From a horizontal standpoint, we migrated systems in our SAP environment that were low risk—training systems, sandbox environments, and other systems that weren’t critical to our business function. We also looked at vertical stacks, taking entire parts of our SAP landscape and migrating them as a unified solution. We gained experience with both migration scenarios, and learned valuable lessons in the early migration stages that helped us smoothly transition critical systems later in the migration process.
Operating as Azure-native
At Microsoft, we develop and host all new SAP infrastructure and systems on Azure. We’re using Azure-based cloud infrastructure and SAP-native software-as-a-service (SaaS) solutions to increase our architecture’s efficiency, and to grow our environment with our business. The following figure represents our SAP landscape on Azure.
The benefits of SAP on Azure
SAP on Azure provides several benefits to our business, many of which have resulted in significant transformation for our company. Some of the most important benefits include:
- Business agility. With Azure’s on-demand SAP-certified infrastructure, we have faster development and test processes, shorter SAP release cycles, and the ability to scale instantaneously on demand to meet peak business usage.
- Efficient insights. SAP on Azure gives us deeper visibility across our SAP landscape. On Azure, our infrastructure is centralized and consolidated. We no longer have our SAP infrastructure spread across multiple on-premises datacenters.
- Efficient, real-time operations and integration. We can leverage integration with other Azure technologies such as Internet of Things (IoT) and predictive analytics to enable real-time capture and analysis of our business environment including areas such as inventory, transaction processing, sales trends, and manufacturing.
- Mission-critical infrastructure. We run our entire SAP landscape—including our most critical infrastructure—on Azure. SAP on Azure supports all aspects of our business environment.
Optimizing SAP on Azure
As our SAP systems change and grow, we’ve realized the need for additional solutions to create a more fluent and cohesive SAP environment on Azure. Monitoring is one of the key areas we’ve addressed. Once SAP was 100 percent hosted in Azure, we saw the potential for a more integrated, end-to-end monitoring environment. As we worked with an Azure-native SAP environment, we discovered aspects of the monitoring and reporting systems we used that could benefit from integration with Azure’s built-in monitoring, as some systems no longer fulfilled our requirements in the cloud. While we didn’t want to design our telemetry solution to run only for SAP on Azure, we did want to take advantage of the Azure platform and the tight integration between Azure and SAP.
Identifying potential for improved monitoring
As we examined the SAP environment on Azure, we found several key areas where we could improve the monitoring and reporting experience:
- Monitoring SAP from external business process components. External business process components had no visibility into SAP. Our monitoring within individual SAP environments provided valuable insight into SAP processes, but we needed a more comprehensive view. SAP is just one component among many in our business processes, and the owners of those business processes didn’t have any way to track their processes once they entered SAP.
- Managing and viewing end-to-end processes. It was difficult to manage and view end-to-end processes. We couldn’t capture the end-to-end process status to effectively monitor individual transactions and their progress within the end-to-end process chain. SAP was disconnected from end-to-end monitoring and illustrated a gap in our knowledge of the entire process pipeline.
- Assessing overall system health. We couldn’t easily assess overall system health. Our pre-existing monitoring solution didn’t provide a holistic view of the SAP environment and the processes with which it interacted. The overall health of processes and systems was incomplete due to missing information for SAP, and issues that occurred within the end-to-end pipeline were difficult to identify and problematic to troubleshoot.
Our SAP on Azure environment was like a black box to many of our business-process owners, and we knew that we could leverage Azure and SAP capabilities to improve the situation. We could create a more holistic monitoring solution for our SAP environment in Azure and the business processes that defined Microsoft operations.
Creating a telemetry solution for SAP on Azure
The distributed nature of our business process environment led us to examine a broader solution—one that would provide comprehensive telemetry and monitoring for our SAP landscape, but also for any other business processes that comprised the end-to-end business landscape at Microsoft. Our implementation was driven by the following important goals:
- Integrate comprehensive telemetry into our monitoring.
- Move towards holistic health monitoring of both applications and infrastructure.
- Create a complete view of end-to-end business processes.
- Create a modern, standards-based structure for our monitoring systems.
Guiding design with business-driven monitoring and personas
We adopted a business-driven approach to building our monitoring solution. This approach examines systems from the end-user perspective and in this instance, the personas represented three primary business groups: business users, executives, and engineering teams. Using the synthetic method, we planned to build our monitoring results around what these personas wanted and needed to observe within SAP and the end-to-end business process, including that the:
- Business user needs visibility to the status of their business transactions as they flow through the Microsoft and SAP ecosystem.
- Executive needs to ensure our business processes are flowing smoothly. If there are critical failures, they need to know before customers or partners.
- Engineer needs to know about business process issues before they impact business operations and lead to customer-satisfaction issues. They need end-to-end visibility of business transactions through SAP telemetry data in a common consumption format.
Creating end-to-end telemetry with the Unified Telemetry Platform
We’ve developed a telemetry platform in Azure that we call the Unified Telemetry Platform (UTP). UTP is a modern, scalable, reliable, and cost-effective telemetry platform that’s used in several different business process monitoring scenarios in Microsoft, including our SAP-related business processes.
UTP is built to enable service maturity and business process monitoring across CSEO. It provides a common telemetry taxonomy and integration with core Microsoft data monitoring services. UTP enables compliance and the maintenance of business standards for data integrity and privacy. While UTP is the implementation we chose, there are numerous ways to enable telemetry on Azure. For additional considerations, see Monitoring and diagnostics on the Azure documentation site.
Capturing telemetry with Azure Monitor
To enable business-driven monitoring and a user-centric approach, UTP captures as many of the critical events within the end-to-end process landscape as possible. Embracing comprehensive telemetry in our systems meant capturing data from all available endpoints to build an understanding of how each process flowed and which of the SAP components were involved. Azure Monitor and its related Azure services serve as the core for our solution.
Azure Application Insights
Application Insights provides an Azure-based solution with which we can dig deep into our Azure-hosted SAP landscape and pull out all necessary telemetry data. Using Application insights, we can automatically generate alerts and support tickets when our telemetry indicates a potential error situation.
Azure Log Analytics
Infrastructure telemetry such as CPU usage, disk throughput and other performance-related data is collected from Azure infrastructure components in the SAP environment using Log Analytics.
Azure Data Explorer
UTP uses Azure Data Explorer as the central repository for all telemetry data sent through Application Insights and Azure Monitor Logs from our application and infrastructure environment. Azure Data Explorer provides enterprise big data interactive analytics; we use the Kusto query language to stitch together the end-to-end transaction flow for our business processes, for both SAP process and non-SAP processes.
Azure Data Lake
UTP uses Azure Data Lake for long-term cold data storage. This data is taken out of the hot and warm streams and kept for reporting and archival purposes in Azure Data Lake to reduce the cost associated with storing large amounts of data in Azure Monitor.
Constructing with definition using common keys and a unified platform
UTP uses Application Insights, Azure Data Explorer, and Azure Data Lake as the foundation for telemetry data. It unifies that data by using a common schema and key structure that ties telemetry data from various sources together to create a complete view of business-process flow. This telemetry hub provides a central point where telemetry is collected from all points in the business-process flow—including SAP and external processes—and then ingested into UTP. It’s then manipulated to create comprehensive business-process workflow views and reporting structures for our personas.
UTP created a clearly defined common schema for business process events and metrics based on a Microsoft-wide standard. That schema contains the metadata necessary for mapping telemetry to services and into processes and allows for joins and correlation across all telemetry.
As part of the common schema for business process events, the design includes a “cross-correlation vector (XCV)” value, common to all stored telemetry and transactions. By persisting a single value for the XCV and populating this attribute for all transactions and telemetry events related to a business process, we can stitch together the entire process chain related to an individual business transaction as it flows through our extended ecosystem.
Implementing UTP in SAP on Azure
The first step in enabling our telemetry platform was to create a reusable custom method and configuration table to drive consistent creation of the telemetry payloads. The configuration table defines the fixed structure of the payload according to the UTP standards.
The method then allows the calling application to pass an application-specific payload to populate the dynamic properties section of the telemetry events payload, and then adds SAP standard elements such as the event date and time, and system identifier. This method can then be called directly from any ABAP code, in either synchronous or asynchronous modes.
For example, in most business processes in our ERP, we use SAP business process events to trigger our telemetry events. The business process events share a custom check routine framework built using SAP Business Rule Framework plus; then custom receiver classes build the dynamic properties of the payload and call the shared telemetry class.
When each event in the workflow is processed in SAP, the JSON payload is passed to Application Insights using an external REST service call, which connects to the UTP framework. The following figure contains an example from our non-delivery order-to-cash process.
In the workflow, each numbered step represents a business process event that is generated within the process flow and then sent to UTP.
When we receive an inbound SAP IDoc for one of our integration partners, the process triggers a check routine to see if that inbound process is relevant for telemetry. If relevant, the system raises the event IDoc_Created. The method then dynamically builds the payload based upon the requirements for the specific business process. The cross-correlation vector (XCV) is set based on the transaction key of the calling system. The XCV is then passed and persisted in each of the subsequent transactions.
For the order-to-cash process, the IDoc is processed through the standard program. At the end of the program, the SAP standard process is triggered to raise the event. This event publishes the IDoc-processed event, with the status of the individual IDoc, XCV, and metrics.
If the IDoc processes successfully, it will commit the sales order to the database. When the sales order is created, the program generates the standard business process event and then triggers the Order Create event, passing the order create time, the XCV, and other application specific attributes.
An event is triggered when the billing document is created, passing the billing document create time, the XCV, and other application-specific attributes.
When the billing document is posted to accounting an event is raised, passing the billing document create time, the XCV, and other application-specific attributes. This enables detection of any “document not posted to accounting” issues in near-real time.
By tracking the business process at this level, we gain numerous insights into business process flow. With UTP integration into SAP in the above process, we can:
- Detect process failures when proscribed business process steps are not executed.
- Measure processing time between steps, which allows us to measure our SLAs more accurately.
- More accurately monitor transaction volumes over time which enables us to detect bottlenecks and tune our system and processing steps.
- Track individual transactions from external systems through each business process step in SAP.
Reporting and dashboarding with Microsoft Power BI
Power BI provides the engine behind our reporting and dashboarding functionality. We’ve built our reporting around business-driven monitoring, and we’ve constructed standard views and dashboards that offer visibility into important areas for each of our key personas. Our dashboards are constructed from Kusto queries within the UTP environment, which are automatically translated in Power BI’s “M” language. For each persona, we’ve enabled a different viewpoint and altitude of our business process that allows them to view the SAP monitoring information most critical to them.
Our UTP provides benefits across our SAP and business-process landscape. We’ve created a solution that facilitates end-to-end business-process monitoring which enables our key personas to do their jobs better.
Benefits for each persona include that:
- Business users no longer need to need to create service tickets to get the status of SAP transaction flows. They can see our business processes from end to end, including SAP transactions and external processes.
- Executives can trust that their business processes execute seamlessly and that any errors are proactively addressed with no impact to customers or partners.
- Engineers no longer need to check multiple SAP transactions to investigate business-process issues and identify in which step the business process failed. They can improve their time-to-detect and time-to-resolve numbers with the right telemetry data and avoid business disruption for our customers.
The benefits of our UTP extend across Microsoft by providing:
- End-to-end visibility into business processes. Our UTP provides visibility into business processes across the organization, which then facilitates better communication and a clearer understanding of all parts of our business. We have a more holistic view of how we’re operating, which helps us work together to achieve our business goals.
- Decreased time to resolve issues. Our visibility into business processes informs users at all levels when an issue occurs. Business users can see the interruption in their workflow, executives are notified of business-process delays, and engineers can identify and resolve issues. This all occurs before customers are affected.
- More efficient business processes. Greater visibility leads to greater efficiency. We can surface issues to stakeholders quickly, everyone involved can recognize areas for potential improvement, and we can monitor modified processes to ensure that improvement is happening.
We learned several important lessons with our UTP implementation for SAP on Azure. These lessons helped inform our progress of UTP development, and they’ve given us best practices to leverage in future projects, including:
- Perform a proper inventory of internal processes. You must be aware of events within a process before you can capture them. Performing a complete and informed inventory of your business processes is critical to capturing the data required for end-to-end business-process monitoring.
- Build for true end-to-end telemetry. Capture all events from all processes and gather telemetry appropriately. Data points from all parts of the business process—including external components—are critical to achieving true end-to-end telemetry.
- Build for Azure-native SAP. SAP on Azure is easier, and instrumenting SAP processes becomes more efficient and effective when SAP components are built for Azure.
- Encourage data-usage models and standards across the organization. Data standards are critical for an accurate end-to-end view. If data is stored in different formats or instrumentation in various parts of the business process, the end reporting results won’t accurately represent the state of the business process.
We’re continuing to evaluate and improve UTP as we discover new and more efficient ways to track our business processes in SAP. Some of our current focus areas include:
- Machine Learning for predictive analytics. We’re using machine learning and predictive analytics to create deeper insights, and more completely understand our current SAP environment and anticipate growth and change in the future.
- Actionable alerting. We’re using Azure Monitor alerts to create service tickets, generate SLA alerts, and provide a robust notification and alerts system. We’re working toward linking detailed telemetry context into our alerting system to create intelligent alerting that enables quicker and more accurate identification of potential issues within the SAP environment.
- Telemetry-based automation. We’re using telemetry to enable automation and remediation within our environment. We’re creating self-healing scenarios to automatically correct common or easy-to-correct issues to create a more intelligent and efficient platform.
We’re continually refining and improving business-process monitoring of SAP on Azure with UTP. It has enabled us to keep key business users informed of business process flow, provided a complete view of business process health to our leadership, and helped our engineering teams create a more robust and efficient SAP environment. Telemetry and business-driven monitoring with UTP have transformed the visibility we have into our SAP on Azure environment, and our continuing journey toward deeper business insight and intelligence is making our entire business better.