Powering digital transformation at Microsoft with Modern Data Foundations

Apr 3, 2020   |  

Two co-workers look at a monitor.

Microsoft Digital has begun eliminating data silos in favor of a single, unified source of trusted, connected enterprise data. This digital transformation replaces legacy systems and lays the foundation for deep insights and intelligent experiences for our customers, partners, and employees. Learn how we built Modern Data Foundations to democratize data access at Microsoft and power a new breed of apps and services.

Like many mature organizations, Microsoft is in the midst of a digital transformation. New capabilities are emerging—especially around how data is collected and used—that present an opportunity to remove limitations that have existed for decades in legacy systems, either by improving the legacy system or by building something new. Within Microsoft, the Microsoft Digital team is responsible for building and operating the internal IT and business systems on which Microsoft runs. It falls on us to level up those systems when they no longer meet Microsoft’s ambitious demands.

Throughout Microsoft, we’re transforming customer engagement, employee and partner productivity, and operational efficiency by infusing intelligence into our products, internal systems, and processes. Widely accessible, trusted, and connected enterprise data makes those intelligent experiences possible, and powers the wider digital transformation at Microsoft.

Data silos impede that vision. Data ecosystems that arose organically in previous decades to serve individual products, teams, and business functions limit that data’s potential to inform other projects. To create experiences that are predictive, prescriptive, and proactive, Microsoft Digital needed to create a single source of data to fuel those experiences. The Enterprise Data Strategy is the ongoing result of that vision.

The Enterprise Data Strategy was created with five goals in mind:

  • To build a single enterprise destination for high-quality, secure, trusted data.
  • To connect data from disparate silos, creating opportunities to leverage that data in ways not possible in a siloed approach.
  • To power responsible data democratization across the organization.
  • To drive efficiency gains in the processes used to access and use data.
  • To meet or exceed compliance and regulatory requirements without compromising Microsoft’s ability to create exceptional products and customer experiences.

Modern Data Foundations are a key component of our Enterprise Data Strategy. In this document we provide an overview of Microsoft’s Enterprise Data Strategy and examine each of the investments of the Modern Data Foundations.

The Microsoft Enterprise Data Strategy

The value of data is directly proportional to the number of people who can connect to and utilize it in meaningful ways. To evolve from unit-level intelligence to a more all-inclusive and connected enterprise intelligence, we have invested in the elements of the Enterprise Data Strategy shown in Figure 1.

An illustration of 4 stairsteps: Date foundations, Scorecards, Analytics, and ML/AI. Underlying all of these steps are “Governance” and “Culture.”
Figure 1. Building an infrastructure for more intelligent insights and experiences

Investments in these foundational elements have been iterative, and each builds on the last. The data foundation provides secure, high-quality, discoverable data. Scorecards measure the impact of that data, analytics extract insights from it, and machine learning (ML) and AI transform it into intelligent experiences. All these elements are linked by governance services designed to foster the responsible democratization of access to and use of data.

In parallel with these construction efforts, Microsoft Digital has fostered the internal cultural shifts that invariably arise in the course of such a transformation. How we capture, store, share, find, and interact with data is changing, and that has cultural ramifications that ripple throughout the organization.

Building modern foundations for trusted and connected data

To create a foundation that enables secure, high-quality, connected enterprise data to be easily discovered, accessed, and used responsibly by teams and users across the enterprise, we modernized our data foundations on five core pillars:

  • Trusted data services to ensure data quality, security, compliance, and governance
  • A single source of truth where connected enterprise data is collected, shaped into trusted forms, secured, made accessible, and conformed to applicable governance controls
  • Connected data products, including unified master data, data from disparate sources conformed to common enterprise data models, and entity hierarchies
  • Modern systems and tools to build and operate data products with sufficient guardrails to prevent improper data proliferation to edge systems and applications
  • A unified data catalog for democratized access to the data and data products that teams require to power their own digital transformation

Measuring what matters with metrics and scorecards

To measure the impact of our digital transformation efforts, we started by defining metrics that matter with scorecards consisting of:

  • Standardized, governed metric definitions for consistent calculations and reporting
  • Automated data pipelines for data collection and measurement
  • Reporting capabilities that are generated and refreshed automatically, and that include dashboards with trends, dimensional pivots, and self-service capabilities

Generating actionable insights with analytics

Actionable insights are generated from metrics using analytics, usually in the form of dimensional pivots and correlation/causation insights. These insights uncover distinct data states and trends to spur timely action, so our robust analytics capabilities include:

  • Trusted and connected enterprise data that directly relate to metrics that matter (defined with scorecards)
  • Self-service analytic tools that enable data analysts and domain experts to generate actionable insights by querying and/or visualizing data, creating analytics modes, and exploring deeper data correlations
  • Tools for data democratization so data analysts and domain experts can publish and share their insights for benefit elsewhere in the company

Converting actionable insights to intelligent experiences with machine learning and AI

With actionable insights created, the next step is turning those insights into intelligent experiences that improve products, increase customer engagement and satisfaction, and boost employee productivity and efficiency. Our investments in machine learning and AI support those goals through predictive, prescriptive, and cognitive intelligence that bolster products and internal systems. This includes:

  • Infrastructure to enable data scientists to build and operate ML/AI models, with tools and services to cleanse and prep data when they need to integrate model-specific data with enterprise data
  • DevOps services and tools ensuring that when data scientists build, test, deploy, and operate ML/AI models, they do so in a secure, compliant, and scalable way
  • A repository of reusable ML/AI models and services that are available to non-data scientists for use in their own products and systems

Democratizing data responsibly with modern data governance

The scale of Microsoft’s digital transformation extends to all corners of the company, not just to data scientists. Data democratization is a driving force in this transformation, and such democratization comes with compliance challenges. Data breaches and compliance violations could not only damage Microsoft’s reputation as a trusted brand, but also create obstacles to achieving a responsible, data-driven culture. Accordingly, our approach to modern data governance includes:

  • Forming and staffing a data governance team to define and operationalize modern data governance across the enterprise. We’re using a hub and spoke model, with the data governance team forming the hub and data stewards in each team publishing or using data to scale governance practices.
  • Governance processes that utilize either automation or human workflows (or some combination of the two), depending on the goals and context.
  • Strong, scalable technological foundations to embed governance practices in data management, data quality management, data security, data access management, compliance, and governance process automation.

Growing and scaling the data community

Since democratized access to data is a foundational goal of the digital transformation, non-technical users and teams will require support and training to evolve their use of data. We’re fostering a community within Microsoft to train teams and apply shared learnings. We’ve created working groups for data topics, training sessions, consultation forums, and shared sources for this purpose.

More broadly, we’ve also made data literacy a core tenet of our software engineering, product management, and design teams and processes. Within our product and service development teams, dedicated data professionals now work to foster a data-driven mindset and healthy data practices. Such structural change requires strong leadership, buy-in, and momentum. Within Microsoft, Microsoft Digital assumes this role and embraces the responsibilities it entails.

Implementing the Modern Data Foundations

Figure 2 below shows the components of the Modern Data Foundations that are powering Microsoft’s digital transformation:

 

An illustration depicting the technologies that make Microsoft’s Modern Data Foundations possible.
Figure 2. The Modern Data Foundations

The Enterprise Data Lake

Microsoft generates and makes use of data of varying volume, velocity, and quality. Use of data has arisen organically within individual teams to serve those teams’ specific needs and goals. However, lacking a single source of truth for enterprise data, teams and individuals often don’t have access to the data they require. When they do have access, the lack of common schemas and standardized processes for access and compliance also present challenges in the discovery and use of data.

Consolidating and standardizing each data generation and publishing system created over the past several decades simply isn’t feasible. The Enterprise Data Lake (EDL)—built on Azure Data Lake, Azure Data Factory, and Azure Synapse Analytics—addresses this challenge by serving as the enterprise’s system of intelligence. There, data from across the enterprise is ingested, conformed, standardized, connected, democratized, and served for enterprise-wide applications in analytics as well as ML/AI.

Trusted data services

Data is valuable only insofar as it’s trusted. We create trusted data by making investments in data quality, security, compliance, and governance services, created by using and extending related Azure services.

Data quality services include probabilistic, rules-based data quality scanning, as well as closed-loop data publisher workflows. Standardized schemas and shared data and analytics models conform data for shared entities, and data sources from multiple systems (and that don’t include common connector attributes) are unified to generate golden records.

Security and management services are built using Azure Active Directory (AAD), Azure Entitlements Lifecycle Management (ELM), and Azure Key Vault. These services are secure by design, an achievement made possible by abstracting the complexities of creating, managing, and operating security and access management capabilities. That abstraction allows for auto-provisioning, managing AAD security groups and memberships, Azure ELM access packages for data assets, and security keys and certificates for access management.

Compliance is handled by automated controls and processes and audit reporting for regulatory standards such as General Data Protection Regulation (GDPR) and the Sarbanes-Oxley Act (SOX).

Strong governance is achieved by operationalizing these and related processes as automated workflows, while still allowing for human touchpoints when needed. Data access management, for example, can be configured to automatically approve data access requests, but can also be configured to require manual review as necessary.

The EDL also includes seamless, metadata-driven integrations with these services so data engineers and products developers can invoke and use them consistently. Efforts are underway to integrate the services with data source systems, too.

Modern services for building and operating data products

With Compute to Data and automated DevOps capabilities, we’ve accelerated the time it takes to build, test, deploy, and operate data products.

With the new Compute to Data paradigm in place, we can build data products without moving or copying data out of the EDL. Data scientists can use standard SQL and SQL-based interfaces to query data and build products using Azure Synapse Analytics. Azure Databricks enable the use of comprehensive programming languages and runtimes so they can build advanced analytics and ML/AI data products. Both of these services integrate with and enable the development of data products within the EDL,without needing data to be copied and proliferated.

Our ML Operations services are built on Azure Machine Learning Services and Azure DevOps to automate and democratize DevOps capabilities for ML/AI products. As a result, data scientists can productize secure, compliant, and scalable ML/AI models as intelligent services themselves, without software engineering skills.

The Data Catalog

The Enterprise Data Catalog is the single destination for our data consumers to find and gain access to the data and data products they need. The EDL Metadata service sends metadata published to the data lake to the catalog for discovery. Broader data sources—transactional data systems and master data, for example—are also registered in the catalog.

Staged migration

To facilitate migration from multiple data siloes to a single EDL, we’ve mapped a path that brings data and stakeholders into the EDL in phases. Phase one involves working with business units throughout Microsoft to gather information about their existing data systems and needs, then using the enterprise data management features in Azure to move data from individual business unit platforms to the EDL. Once their data is on the EDL, the business units can interact with the EDL to perform all the data capture and analysis they had previously been performing on their own platforms.

In phase two, remaining business units will migrate all their data to the EDL. At the end of phase two, all business units will be onboarded to the EDL as platform tenants. The EDL will provide a complete range of managed services for data ingestion, computation, and data cube construction and maintenance.

Finally, during phase three, Microsoft Digital will be working with Microsoft operations teams and external parties to bring even more data into the EDL, including data from internal operational sources such SAP ERP and Adobe Experience Cloud, as well as data from external sources such as partners, external data brokers, and others.

Defining success

The ultimate goal of these initiatives is to facilitate the creation of intelligent data products, and to reduce the time it takes to build, measure, and evolve those products.

Microsoft generates, captures, possesses, and analyzes vast amounts of data. By building the Modern Data Foundations, Microsoft Digital has democratized and consolidated access to that data throughout Microsoft. In doing so, we unlocked the raw materials needed to build more predictive, proactive, and intelligent experiences.

The efforts described here also create significant efficiency gains. In many cases, the time it takes to find and gain access to the enterprise data needed to build intelligent experiences has been reduced from weeks to hours (with the exception of access requests that require manual review, which now takes days instead of weeks). These efficiency gains happen at the outset, beginning with data access, but have a ripple effect throughout the processes that follow.

We’ve developed baseline metrics to measure these impacts:

  • Time to build connected data products
  • Time to integrate connected data products into existing applications and services

Reduced cost is a byproduct of the greater digital transformation, but it is not necessarily the end goal. Rather than tackling spend in absolute terms, these initiatives optimize costs, tailoring spend to a business’s unique growth cycles. Post-transformation, spend is more efficient and proportional to growth.

Additionally, the elimination of multiple business unit platforms reduces the security and governance vulnerabilities that arise as an organization strives to manage so many separate platforms. In the process of this digital transformation, Microsoft reduces its financial and reputational risk simply by reducing the exposure created by having so many discrete systems to protect against a breach. Customers who follow a similar path can, of course, expect the same.

Conclusion

Microsoft is a mature organization; the company has evolved and expanded, both organically and by acquisition, for more than 40 years. That evolution has led to the creation and maintenance of many discrete silos of data, each one reflecting the needs of the internal organizations that generated, captured, or consolidated that data.

By gathering previously siloed datasets into a single Enterprise Data Lake we created a single system of intelligence, a credible foundation upon which we can make well-informed, data-based decisions about the work we do and the products and services we deliver. Moreover, this strategy enables us to govern and secure our data assets more efficiently and effectively using the centralized enterprise data management capabilities of Azure. This results in lower operating costs and greater confidence in the integrity of data that we share with regulators and other stakeholders who have a duty to monitor certain activities of the corporation.

By using the full range of Microsoft Azure offerings, we’ve successfully worked with groups throughout the enterprise to migrate their operations and data onto this consolidated data platform. In outlining our strategy, we hope to provide a blueprint that other companies can follow to build their own Azure-based data foundations to gain the insights they need to deliver their products and services more effectively to their customers.