Microsoft internal SAP workload gets a telemetry boost with Azure

May 2, 2019   |  

For the first time, Microsoft has end-to-end visibility into the millions of business processes that it runs through SAP every day.

Microsoft is using a new suite of Microsoft Azure telemetry tools to gain insight into how to better manage expense reports, time away reporting, purchase order creation, and similar business processes that get routed through one of the largest SAP instances in the world. Before Microsoft moved its SAP workload into Azure and before it started using new Azure telemetry tools, there was no way to connect such transactions inside and outside of SAP.

“This made the process feel like a black box to Microsoft employee users, and to our engineers, who needed to figure out what was happening so they could connect the dots and improve our services,” says Enda Sullivan, a senior program manager for Microsoft Core Services Engineering and Operation’s (CSEO) internal SAP implementation.

Because the SAP processes were not connected end-to-end, it was difficult to help Microsoft employees when something went wrong.

“If there was a problem, it would require the user to engage with multiple teams, both SAP and non-SAP, to understand the status of a request,” Sullivan says. “Beyond that, the support teams often wouldn’t have visibility to failed transactions between the various system steps. As a user, I should never have to call a helpdesk, the failure should be detected by the service telemetry and monitoring, and be resolved before I’m even aware.”

Now through Azure, the team has real-time data that tells them when SAP process issues come up, which, importantly, allows them to get resolved before users realize something went wrong.

Telemetry comes a long way

Cory Delamarter poses for a photo in a hallway near his office.
Cory Delamarter manages implementation of the Unified Telemetry Platform (UTP) at Microsoft. Delamarter is a principal program manager in Core Services Engineering and Operations (CSEO). (Photo by Jim Adams | Showcase)

Historically, the use of telemetry to guide how companies like Microsoft use SAP was spotty at best, says Cory Delamarter, a principal program manager tasked with driving telemetry design and implementation standards across CSEO, which provides IT services for all of Microsoft.

“It’s great that Azure is giving us these new tools to work with, but this is something we were already starting to tackle,” he says. “The opportunity and value of getting all our data in one place is too high to not solve this problem.”

Delamarter’s team is working to bring a consistent approach to telemetry across all CSEO, an effort that has been tabbed the Unified Telemetry Platform (UTP).

“When we started to architect a new solution for telemetry, it needed to be scalable, reliable, and cost-effective, but most importantly a single common platform across the org,” he says. “Essentially, we are consolidating tools and data stores.”

Standardization was a must, he says.

“We had to design in flexibility that would support more than a system for monitoring a database or website,” Delamarter says. “The power of unified telemetry is the ability to solve problems across boundaries, or service health, which supports the higher-level business processes.”

The companywide approach to telemetry is being built around Azure Monitor Application Insights, Azure Data Lake (Gen2) and Azure Data Explorer. “Application Insights provides the ability to ingest and organize incoming telemetry data, while Azure Data Explorer gives us the ability to aggregate and support queries across very large data sets stored in our data lake,” he says.

Graphic showing Unified Telemetry Platform from end-to-end.
The Unified Telemetry Platform (UTP) system ingests data from applications and infrastructure across the Microsoft internal environment. Data is transformed into a standard schema using Application Insights and housed in Azure Data Lake for cold storage. Azure Data Explorer provides the ability to query the datasets and build dashboards with Power BI in addition to Application Insights and Azure Monitor.

Taking away the mystery

When it comes to getting more insights out of Microsoft’s SAP workload, it really came down to taking the mystery away.

“For SAP, we had to get out of the four walls and shine a light so it’s no longer a black box,” says Aron Stern, a senior software engineer inside CSEO who is responsible for the Azure architecture of the company’s SAP infrastructure.

First, the team needed to simplify everything.

“We thought about the solution in two parts, telemetry, the raw metadata being emitted by our applications, then monitoring, reporting on service health through dashboards and alerting,” Stern says. “From there we separated the data into three layers—infrastructure, application, and business process.”

Then the team needed to implement a bit of customization.

“We built a small custom application using a few Azure tools,” Stern says. “That allowed us to convert our application and business process telemetry events from SAP into common schema—this allowed us to stitch our transactions together so we could get the end-to-end view we were looking for.”

The team then ingested all the SAP data into its Application Insights instance and fed that into an Azure Data Lake for cold storage and Power BI reporting.

This new shift has enabled a powerful transformation, says Blake Barrow, a principal software engineer inside CSEO. “Business process telemetry is what enables us to measure SLA’s and provide transparency to our users,” he says.

Everyone who has access to this new telemetry are enjoying a whole wave of new insights.

“Our employee users now have better insight into any transaction they have on SAP,” Barrow says. “Our engineering teams are getting the data they need to detect issues before they can create downstream problems. Business executives have access to dashboards that provide near real-time status of transaction volumes, so they look for trends and do health checks on their programs.”

All are changes that wouldn’t be possible without transforming the way Microsoft approaches telemetry. “These are the kinds of improvements that can help us all have more impact,” he says. “This is what digital transformation is all about.”

Barrow, Stern, and Sullivan will be presenting how Microsoft is using UTP to gain insights on its SAP workload at the SAP SapphireNow Conference on Wednesday, May 8th. Go here to learn more about their session. Go here to learn more about how Microsoft migrated its SAP workload to Microsoft Azure and much more on how the company is managing its SAP workload.

Tags: , , , , , , , , , ,