Turning to DevOps engineering practices to democratize access to data at Microsoft

Oct 19, 2020   |  

There’s been an evolution of data products and data product development practices at Microsoft.

Data product development and operations now embrace the philosophies, practices, and tools of modern software development. It’s called data product DevOps, and it involves applying modern software engineering practices to build, deploy, and operate impactful data products as reliable services.

“The change accelerates the velocity at which actionable insights and intelligence can be extracted from an organization’s data and applied to digitally transform its products and operations,” says Guru Prasad, a principal engineering lead on the Data team in Microsoft Digital.

The Data team and Microsoft Digital have been defining and applying data product DevOps practices to scale the development and operations of reliable data products. Microsoft Digital has taken its learnings from deploying these practices across the company and has used them to build DevOps services for data products.

This evolution in technology enables data professionals, like data engineers, data analysts, and data scientists, to build and operate data products as reliable and scalable services, all without professional software engineers.

Data products can be datasets that enrich data from across an enterprise to enable connected enterprise insights and intelligence. They can then be integrated throughout products and business systems as dashboards and reports, analytics models, machine learning and AI models, and data APIs.

-Guru Prasad, principal engineering lead, Microsoft Digital

But first, what is a data product?

Data products are apps or tools you can use to power your digital transformation by extracting insights and intelligence from data.

“Data products can be datasets that enrich data from across an enterprise to enable connected enterprise insights and intelligence,” Prasad says. “They can then be integrated throughout products and business systems as dashboards and reports, analytics models, machine learning and AI models, and data APIs.”

[Learn how Microsoft uses data products, like machine learning, to optimize space planning. Discover how Microsoft has created a modern data governance strategy to accelerate digital transformation.]

Building and operating data products

At the end of the day, data products empower digital transformation.

To integrate with user-facing products and business services, data products need to be operated with the reliability and scalability of enterprise-grade software.

How a data product is built is a key determinant for how well it can be operated.

“Such engineering entails applying modern software engineering practices in building and operating data products,” Prasad says.

But a shift in mindset and practices is needed to address the fundamentals of operations before building and deploying a data product. This change requires an amalgamation of data engineering, software engineering, and operations best practices.

“Data product operations include deployment automation, infrastructure optimization, health monitoring and alerting, and proactive governance controls,” Prasad says.

Automation combines engineering and operations best practices from the start, not retroactively.

“The practitioners who build and operate data products are generally data engineers, data analysts, and data scientists, not software engineers,” Prasad says. “We needed a solution for building and operating data products that are usable by data practitioners.”

Democratizing excellence in building and operating data products

So how can an organization enable data practitioners to build and operate reliable and scalable data products?

“We apply DevOps practices prevalent in software engineering to building and operating data products, and invest in democratizing such practices with tools and services that automate them for data practitioners,” Prasad says.

Although applying DevOps practices in building and operating data products has many advantages, a few key enterprise-scale benefits stand out. They are data governance, security and compliance, infrastructure optimization, and observability to maximize the value outcomes of data products.

Data governance

Governance controls are essential to proactively avoid duplicating investments in data products, which creates data fragmentation. Fragmentation increases data exposure risks and challenges in synchronizing multiple versions of the same data and their applications.

“It’s essential to catalog all data products with rich metadata that describes them and offer lineage tracing,” Prasad says. “Such metadata and lineage tracing enable the proactive detection and mitigation of data products and data fragmentation. Knowing the data sources used to build data products and the applications that use them is important for an enterprise.”

Generating such metadata and lineage tracing is foundational to governing an enterprise’s data estate. Unfortunately, doing so is often not a forethought for data practitioners.

As a consequence, enterprises commonly require reactive detection of gaps and manual patching to catalog and govern data and data products. These reactive measures aren’t scalable when building and evolving data products at the velocity needed to impact digital transformation outcomes.

“Abstracting and automating the generation of metadata and lineage tracing for data products is an essential data product DevOps capability,” Prasad says. “Doing so enables consistency and completeness in the metadata for data products and their lineages, thereby establishing the foundations for effective governance in preventing data fragmentation and misuse.”

Security and compliance

Securing data products and ensuring that their applications are compliant with enterprise and broader regulatory standards are essential fundamentals to responsibly democratize data.

Security and compliance entails adopting and adapting to the evolving standards for information and security practices in protecting access to data products These guardrails ensure compliant use of the data product.

“Implementing security and compliance fundamentals for data products is a deeply technical and operations-intensive undertaking,” Prasad says. “Data product DevOps services abstract our data practitioners from this complexity enable them to build and operate secure and compliant data products without needing to be technical experts in these domains.”

Data product DevOps services automate the standards for security, access management, and compliance. This automation enables data practitioners to build and operate data products that are enterprise grade in these fundamentals.

“These capabilities include automating modern security practices such as Zero Trust by default, operationalizing access management policies defined by data stewards, and granting just-in-time access for compliant applications,” Prasad says. “By automating essential compliance operations, such as GDPR data delete processing and reporting for SOX auditing, enterprises can have assurances about their data products.”

Infrastructure optimization

Deployment and infrastructure optimization are critical technical capabilities with implications that impact the performance and economics of data products.

“Data practitioners are not the experts in choosing and configuring the best-fit deployment infrastructure for their data products,” Prasad says.

By automating the best-fit infrastructure selection and deployment of data products, Microsoft has enabled its data practitioners to self-serve in operationalizing their data products without needing to be supported by engineers.

“The infrastructure selection by our automated deployment services is optimized to balance the scaling and performance requirements of data products with responsible infrastructure economics,” he says.

Maximizing value outcomes with observability

The impact of data products is measured by the business value outcomes that they enable.

“The lifecycle of a data product is a virtuous build, measure, learn, optimize, or pivot cycle,” Prasad says.

Data products must be instrumented to emit telemetry signals, which are monitored to detect anomalies in health and value outcomes. Prasad says that anomaly conditions must be detected and alerted as early as possible and prior to material negative impact.

The implementation of observability capabilities to maximize a data product’s value outcomes is a technical undertaking by software and operations engineers that adds material delay to the time to deploy.

“Constantly monitoring the health of data products is core to maximizing their value outcomes,” Prasad says. “Data product DevOps services automates telemetry, monitoring, and alerting for data products, enabling data practitioners to deploy and operate observable data products like professional software engineers.”

Data product DevOps accelerates digital transformation

Extracting actionable insights and intelligence from data is foundational to achieving the value outcomes of an organization’s digital transformation investments.

We are on a mission to accelerate Microsoft’s digital transformation with our data product DevOps services investments. We are on a learning journey, have made some strides, and have a lot more to learn and share as we surge forward.

-Guru Prasad, principal engineering lead, Microsoft Digital

With data product DevOps services, data practitioners can build, operate, and optimize reliable and scalable data products in fast turn cycles without needing to be professional software engineers. This accelerates the application of data products to realize outcomes.

“We are on a mission to accelerate Microsoft’s digital transformation with our data product DevOps services investments,” Prasad says. “We are on a learning journey, have made some strides, and have a lot more to learn and share as we surge forward.”

Learn how Microsoft uses data products, like machine learning, to optimize space planning.

Discover how Microsoft has created a modern data governance strategy to accelerate digital transformation.