Artificial intelligence (AI) and machine learning are making a big impact on how people work, socialize, and live their lives. As consumption of products and services built around AI and machine learning increases, specialized actions must be undertaken to safeguard not only your customers and their data, but also to protect your AI and algorithms from abuse, trolling, and extraction.

We are pleased to announce the release of a research paper, Securing the Future of Artificial Intelligence and Machine Learning at Microsoft, focused on net-new security engineering challenges in the AI and machine learning space, with a strong focus on protecting algorithms, data, and services. This content was developed in partnership with Microsoft’s AI and Research group. It’s referenced in The Future Computed: Artificial Intelligence and its role in society by Brad Smith and Harry Shum, as well as cited in the Responsible bots: 10 guidelines for developers of conversational AI.

This document focuses entirely on security engineering issues unique to the AI and machine learning space, but due to the expansive nature of the InfoSec domain, it’s understood that issues and findings discussed here will overlap to a degree with the domains of privacy and ethics. As this document highlights challenges of strategic importance to the tech industry, the target audience for this document is security engineering leadership industry-wide.

Our early findings suggest that:

  1. Secure development and operations foundations must incorporate the concepts of Resilience and Discretion when protecting AI and the data under its control.
  • AI-specific pivots are required in many traditional security domains such as Authentication, Authorization, Input Validation, and Denial of Service mitigation.
  • Without investments in these areas, AI/machine learning services will continue to fight an uphill battle against adversaries of all skill levels.
  1. Machine learning models are largely unable to discern between malicious input and benign anomalous data. A significant source of training data is derived from un-curated, unmoderated public datasets that may be open to third-party contributions.
  • Attackers don’t need to compromise datasets when they are free to contribute to them. Such dataset poisoning attacks can go unnoticed while model performance inexplicably degrades.
  • Over time, low-confidence malicious data becomes high-confidence trusted data, provided that the data structure/formatting remains correct and the quantity of malicious data points is sufficiently high.
  1. Given the great number of layers of hidden classifiers/neurons that can be leveraged in a deep learning model, too much trust is placed on the output of AI/machine learning decision-making processes and algorithms without a critical understanding of how these decisions were reached.
  • AI/machine learning is increasingly used in support of high-value decision-making processes in medicine and other industries where the wrong decision may result in serious injury or death.
  • AI must have built-in forensic capabilities. This enables enterprises to provide customers with transparency and accountability of their AI, ensuring its actions are not only verifiably correct but also legally defensible.
  • When combined with data provenance/lineage tools, these capabilities can also function as an early form of “AI intrusion detection,” allowing engineers to determine the exact point in time that a decision was made by a classifier, what data influenced it, and whether or not that data was trustworthy.

Our goal is to bring awareness and energy to the issues highlighted in this paper while driving new research investigations and product security investments across Microsoft. Read the Securing the Future of Artificial Intelligence and Machine Learning at Microsoft paper to learn more.