P = NP: Cloud data protection in vulnerable non-production environments

Data is the holy grail of your cloud workloads for attackers. Data breaches are the kind of breaches that make the news. With the recent European Union General Data Protection Regulations (GDPR), they will make even bigger headlines. From an enterprise point of view, the most challenging aspect of protecting data is knowing what it is and where it resides. Only when these two questions are answered can you drive data protection via organizational policies.

Most of your sensitive data is collected in production environments—the environments you know that you need to protect, and you usually do. But this is only part of the story. Even though best practices mandate that sensitive information be scrubbed before it transits in the organization, this cannot be ensured. It stands in contradiction to the growing adoption and improvements of the shift-left testing concept, as well as other business needs.

Shift-left testing is the movement of testing to earlier stages in the development lifecycle. Mature testing in early stages is appreciated as it helps developers find problems earlier and in a more cost-effective manner. It also helps quality assurance teams to reproduce bugs in the system and accelerates the debugging processes.

There are other business needs for pulling data to non-production environments. In the research and analytics space, data scientists and analysts prefer to use real data to do their research effectively, whether to offer models that improve the production systems, to perform forensic and log analysis, or to bring insight to product, strategy, and marketing teams, to name a few. In the customer service space, helpdesk personnel may need to pull sensitive records to allow them to perform their jobs efficiently.

For these purposes and others, production data is being pulled not only to the staging environment, but also to development and test environments, as well as research and analytics environments. Data may even reach personal or team playgrounds. Oftentimes, the reality is that organizations disperse data across various environments, making it hard to keep track of what and where.

The following schematic depicts the flow of code from development environments to staging and production environments, along with the flow of production data back to staging, development, and research environments to allow for mature testing and business improvement at earlier stages. The latter flow may even continue to leak outside the organization’s IT.

A diagram showing code push and data pull during deployments

From a security point of view, the data pull should be protected, and sensitive data should not be present in a non-production environment. Synthetic fake data generation should be applied when possible, and format-preserving masking should be applied when data needs to be more realistic. However, not using real data will always impose some loss of data properties and, in turn, the data will always lack some characteristics that may be crucial for testing, and certainly for research. Therefore, to enable advanced testing at earlier stages and allow for better analytics, real data will keep being pulled out of production environments, and the associated risk will be spread throughout the organization’s data stores.

To address this risk, applying perimeter solutions is a good start. But if this is your answer to the risk, then you should think again! Are you sure that once an attacker gets a hold of your sensitive data, he cannot evade detection? Are you sure that you have no malicious insiders? What is a perimeter in the cloud?

Let’s take a step back and rethink the basics of what is needed from a data protection solution: beyond basic security requirements, such as role-based access control, multifactor authentication, setting up firewalls, and encrypting data at rest and data in transit, advanced threat protection should be deployed. This comprises of:

Visibility on where your sensitive data resides, what type of sensitive data it is, and who is accessing this data and how.
Understanding the vulnerabilities of your data stores and being able to fix them.
Detecting the threats and attempts made to infiltrate your data stores.

Any subset of these capabilities is going to leave weak spots in your organization’s posture. For instance, if you have visibility regarding the whereabouts of sensitive data, but no knowledge of the vulnerabilities of your databases, can you be sure that any attempt to infiltrate/exfiltrate your database is detected? Test environments are commonly targeted for data breaches where real data is used for testing and development purposes, like the recent example of Shutterfly.

In addition, if you have a vulnerability in a non-production resource, most likely it exists in similar production resources as well. Finding this out gives a great deal of leverage in reconnaissance terms to attackers. They can probe and investigate non-production environments to find weak spots, then apply them to production environments, minimizing their contact with your production environments, and minimizing the probability of being caught by your threat detection solutions—in case the latter is only deployed on your production environments.

This establishes the following imperative: data protection must be an organization-wide solution, not only a production environment deployment! Hence, P = NP.

From a cloud workload protection perspective, you should build a vision of how to protect your data resources that considers your IT, DevOps, and research methodologies, as well as your data stewardship practices. Deriving a roadmap for this vision requires a solution that allows you to discover your organization’s data resources, including any resources in your shadow IT infrastructure. The outcome of this methodic process—whether it’s manual, semi-automated, or fully automated—should be a mapping of your data estate across all sorts of environments and an associated risk statement with each resource. This evaluation gives you a metric and can be used as a compass to secure your organization. The resources that were deemed eligible for advanced security should then be continuously monitored with advanced threat prevention solutions that keep you alerted with the vulnerabilities of your resources, the sensitivity of your data, and a real-time threat detection capability. Therefore, when we are asked by customers whether they should protect their non-production environments, our answer is: P = NP!

Azure Security Center is a great built-in tool with Azure that can help you protect all your environments. It helps you assess the security state of your cloud resources, both production and non-production environments and provides advanced threat protection against evolving threats. You can start a free trial for Azure and the Security Center, or if you’re already using Azure, just open the Security Center blade to start using it today.

P = NP: Cloud data protection in vulnerable non-production environments

Tags

Content types

Products and services

Topics

Tamer Salman

Related posts

Recent enhancements for Microsoft Power Platform governance

Microsoft Security—detecting empires in the cloud

Microsoft announces new Project OneFuzz framework, an open source developer tool to find and fix bugs at scale

Get started with Microsoft Security

Share