Editor’s note: This year in review is a sampling of responsible AI research compiled by Aether, a Microsoft cross-company initiative on AI Ethics and Effects in Engineering and Research, as outreach from their commitment to advancing the practice of human-centered responsible AI. Although each paper includes authors who are participants in Aether, the research presented here expands beyond, encompassing work from across Microsoft, as well as with collaborators in academia and industry.
Inflated expectations around the capabilities of AI technologies may lead people to believe that computers can’t be wrong. The truth is AI failures are not a matter of if but when. AI is a human endeavor that combines information about people and the physical world into mathematical constructs. Such technologies typically rely on statistical methods, with the possibility for errors throughout an AI system’s lifespan. As AI systems become more widely used across domains, especially in high-stakes scenarios where people’s safety and wellbeing can be affected, a critical question must be addressed: how trustworthy are AI systems, and how much and when should people trust AI?
As part of their ongoing commitment to building AI responsibly, research scientists and engineers at Microsoft are pursuing methods and technologies aimed at helping builders of AI systems cultivate appropriate trust—that is, building trustworthy models with reliable behaviors and clear communication that set proper expectations. When AI builders plan for failures, work to understand the nature of the failures, and implement ways to effectively mitigate potential harms, they help engender trust that can lead to a greater realization of AI’s benefits.
Pursuing trustworthiness across AI systems captures the intent of multiple projects on the responsible development and fielding of AI technologies. Numerous efforts at Microsoft have been nurtured by its Aether Committee, a coordinative cross-company council comprised of working groups focused on technical leadership at the frontiers of innovation in responsible AI. The effort is led by researchers and engineers at Microsoft Research and from across the company and is chaired by Chief Scientific Officer Eric Horvitz. Beyond research, Aether has advised Microsoft leadership on responsible AI challenges and opportunities since the committee’s inception in 2016.
The following is a sampling of research from the past year representing efforts across the Microsoft responsible AI ecosystem that highlight ways for creating appropriate trust in AI. Facilitating trustworthy measurement, improving human-AI collaboration, designing for natural language processing (NLP), advancing transparency and interpretability, and exploring the open questions around AI safety, security, and privacy are key considerations for developing AI responsibly. The goal of trustworthy AI requires a shift in perspective at every stage of the AI development and deployment life cycle. We’re actively developing a growing number of best practices and tools to help with the shift to make responsible AI more available to a broader base of users. Many open questions remain, but as innovators, we are committed to tackling these challenges with curiosity, enthusiasm, and humility.
Facilitating trustworthy measurement
AI technologies influence the world through the connection of machine learning models—that provide classifications, diagnoses, predictions, and recommendations—with larger systems that drive displays, guide controls, and activate effectors. But when we use AI to help us understand patterns in human behavior and complex societal phenomena, we need to be vigilant. By creating models for assessing or measuring human behavior, we’re participating in the very act of shaping society. Guidelines for ethically navigating technology’s impacts on society—guidance born out of considering technologies for COVID-19—prompt us to start by weighing a project’s risk of harm against its benefits. Sometimes an important step in the practice of responsible AI may be the decision to not build a particular model or application.
Human behavior and algorithms influence each other in feedback loops. In a recent Nature publication, Microsoft researchers and collaborators emphasize that existing methods for measuring social phenomena may not be up to the task of investigating societies where human behavior and algorithms affect each other. They offer five best practices for advancing computational social science. These include developing measurement models that are informed by social theory and that are fair, transparent, interpretable, and privacy preserving. For trustworthy measurement, it’s crucial to document and justify the model’s underlying assumptions, plus consider who is deciding what to measure and how those results will be used.
In line with these best practices, Microsoft researchers and collaborators have proposed measurement modeling as a framework for anticipating and mitigating fairness-related harms caused by AI systems. This framework can help identify mismatches between theoretical understandings of abstract concepts—for example, socioeconomic status—and how these concepts get translated into mathematics and code. Identifying mismatches helps AI practitioners to anticipate and mitigate fairness-related harms that reinforce societal biases and inequities. A study applying a measurement modeling lens to several benchmark datasets for surfacing stereotypes in NLP systems reveals considerable ambiguity and hidden assumptions, demonstrating (among other things) that datasets widely trusted for measuring the presence of stereotyping can, in fact, cause stereotyping harms.
Flaws in datasets can lead to AI systems with unfair outcomes, such as poor quality of service or denial of opportunities and resources for different groups of people. AI practitioners need to understand how their systems are performing for factors like age, race, gender, and socioeconomic status so they can mitigate potential harms. In identifying the decisions that AI practitioners must make when evaluating an AI system’s performance for different groups of people, researchers highlight the importance of rigor in the construction of evaluation datasets.
Making sure that datasets are representative and inclusive means facilitating data collection from different groups of people, including people with disabilities. Mainstream AI systems are often non-inclusive. For example, speech recognition systems do not work for atypical speech, while input devices are not accessible for people with limited mobility. In pursuit of inclusive AI, a study proposes guidelines for designing an accessible online infrastructure for collecting data from people with disabilities, one that is built to respect, protect, and motivate those contributing data.
- Responsible computing during COVID-19 and beyond
- Measuring algorithmically infused societies
- Measurement and fairness
- Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets
- Understanding the representation and representativeness of age in AI data sets
- Designing disaggregated evaluations of AI systems: Choices, considerations, and tradeoffs
- Designing an online infrastructure for collecting AI data from People with Disabilities
Improving human-AI collaboration
When people and AI collaborate on solving problems, the benefits can be impressive. But current practice can be far from establishing a successful partnership between people and AI systems. A promising advance and direction of research is developing methods that learn about ideal ways to complement people with problem solving. In the approach, machine learning models are optimized to detect where people need the most help versus where people can solve problems well on their own. We can additionally train the AI systems to make decisions as to when a system should ask an individual for input and to combine the human and machine abilities to make a recommendation. In related work, studies have shown that people will too often accept an AI system’s outputs without question, relying on them even when they are wrong. Exploring how to facilitate appropriate trust in human-AI teamwork, experiments with real-world datasets for AI systems show that retraining a model with a human-centered approach can better optimize human-AI team performance. This means taking into account human accuracy, human effort, the cost of mistakes—and people’s mental models of the AI.
In systems for healthcare and other high-stakes scenarios, a break with the user’s mental model can have severe impacts. An AI system can compromise trust when, after an update for better overall accuracy, it begins to underperform in some areas. For instance, an updated system for predicting cancerous skin moles may have an increase in accuracy overall but a significant decrease for facial moles. A physician using the system may either lose confidence in the benefits of the technology or, with more dire consequences, may not notice this drop in performance. Techniques for forcing an updated system to be compatible with a previous version produce tradeoffs in accuracy. But experiments demonstrate that personalizing objective functions can improve the performance-compatibility tradeoff for specific users by as much as 300 percent.
System updates can have grave consequences when it comes to algorithms used for prescribing recourse, such as how to fix a bad credit score to qualify for a loan. Updates can lead to people who have dutifully followed a prescribed recourse being denied their promised rights or services and damaging their trust in decision makers. Examining the impact of updates caused by changes in the data distribution, researchers expose previously unknown flaws in the current recourse-generation paradigm. This work points toward rethinking how to design these algorithms for robustness and reliability.
Complementarity in human-AI performance, where the human-AI team performs better together by compensating for each other’s weaknesses, is a goal for AI-assisted tasks. You might think that if a system provided an explanation of its output, this could help an individual identify and correct an AI failure, producing the best of human-AI teamwork. Surprisingly, and in contrast to prior work, a large-scale study shows that explanations may not significantly increase human-AI team performance. People often over-rely on recommendations even when the AI is incorrect. This is a call to action: we need to develop methods for communicating explanations that increase users’ understanding rather than to just persuade.
- Learning to complement humans
- Is the most accurate AI the best teammate? Optimizing AI for teamwork
- Improving the performance-compatibility tradeoff with personalized objective functions
- Algorithmic recourse in the wild: Understanding the impact of data and model shift
- Does the whole exceed its parts? The effect of AI explanations on complementary team performance
- A Bayesian approach to identifying representational errors
Designing for natural language processing
The allure of natural language processing’s potential, including rash claims of human parity, raises questions of how we can employ NLP technologies in ways that are truly useful, as well as fair and inclusive. To further these and other goals, Microsoft researchers and collaborators hosted the first workshop on bridging human-computer interaction and natural language processing, considering novel questions and research directions for designing NLP systems to align with people’s demonstrated needs.
Language shapes minds and societies. Technology that wields this power requires scrutiny as to what harms may ensue. For example, does an NLP system exacerbate stereotyping? Does it exhibit the same quality of service for people who speak the same language in different ways? A survey of 146 papers analyzing “bias” in NLP observes rampant pitfalls of unstated assumptions and conceptualizations of bias. To avoid these pitfalls, the authors outline recommendations based on the recognition of relationships between language and social hierarchies as fundamentals for fairness in the context of NLP. We must be precise in how we articulate ideas about fairness if we are to identify, measure, and mitigate NLP systems’ potential for fairness-related harms.
The open-ended nature of language—its inherent ambiguity, context-dependent meaning, and constant evolution—drives home the need to plan for failures when developing NLP systems. Planning for NLP failures with the AI Playbook introduces a new tool for AI practitioners to anticipate errors and plan human-AI interaction so that the user experience is not severely disrupted when errors inevitably occur.
- Proceedings of the first workshop on bridging human-computer interaction and natural language processing
- Language (technology) is power: A critical survey of “bias” in NLP
- Planning for natural language failures with the AI Playbook
- Invariant language modeling
- On the relationships between the grammatical genders of inanimate nouns and their co-occurring adjectives and verbs
To build AI systems that are reliable and fair—and to assess how much to trust them—practitioners and those using these systems need insight into their behavior. If we are to meet the goal of AI transparency, the AI/ML and human-computer interaction communities need to integrate efforts to create human-centered interpretability methods that yield explanations that can be clearly understood and are actionable by people using AI systems in real-world scenarios.
As a case in point, experiments investigating whether simple models that are thought to be interpretable achieve their intended effects rendered counterintuitive findings. When participants used an ML model considered to be interpretable to help them predict the selling prices of New York City apartments, they had difficulty detecting when the model was demonstrably wrong. Providing too many details of the model’s internals seemed to distract and cause information overload. Another recent study found that even when an explanation helps data scientists gain a more nuanced understanding of a model, they may be unwilling to make the effort to understand it if it slows down their workflow too much. As both studies show, testing with users is essential to see if people clearly understand and can use a model’s explanations to their benefit. User research is the only way to validate what is or is not interpretable by people using these systems.
Explanations that are meaningful to people using AI systems are key to the transparency and interpretability of black-box models. Introducing a weight-of-evidence approach to creating machine-generated explanations that are meaningful to people, Microsoft researchers and colleagues highlight the importance of designing explanations with people’s needs in mind and evaluating how people use interpretability tools and what their understanding is of the underlying concepts. The paper also underscores the need to provide well-designed tutorials.
Traceability and communication are also fundamental for demonstrating trustworthiness. Both AI practitioners and people using AI systems benefit from knowing the motivation and composition of datasets. Tools such as datasheets for datasets prompt AI dataset creators to carefully reflect on the process of creation, including any underlying assumptions they are making and potential risks or harms that might arise from the dataset’s use. And for dataset consumers, seeing the dataset creators’ documentation of goals and assumptions equips them to decide whether a dataset is suitable for the task they have in mind.
- A human-centered agenda for intelligible machine learning
- Datasheets for datasets
- Manipulating and measuring model interpretability
- From human explanation to model interpretability: A framework based on weight of evidence
- Summarize with caution: Comparing global feature attributions
Advancing algorithms for interpretability
Interpretability is vital to debugging and mitigating the potentially harmful impacts of AI processes that so often take place in seemingly impenetrable black boxes—it is difficult (and in many settings, inappropriate) to trust an AI model if you can’t understand the model and correct it when it is wrong. Advanced glass-box learning algorithms can enable AI practitioners and stakeholders to see what’s “under the hood” and better understand the behavior of AI systems. And advanced user interfaces can make it easier for people using AI systems to understand these models and then edit the models when they find mistakes or bias in them. Interpretability is also important to improve human-AI collaboration—it is difficult for users to interact and collaborate with an AI model or system if they can’t understand it. At Microsoft, we have developed glass-box learning methods that are now as accurate as previous black-box methods but yield AI models that are fully interpretable and editable.
Recent advances at Microsoft include a new neural GAM (generalized additive model) for interpretable deep learning, a method for using dropout rates to reduce spurious interaction, an efficient algorithm for recovering identifiable additive models, the development of glass-box models that are differentially private, and the creation of tools that make editing glass-box models easy for those using them so they can correct errors in the models and mitigate bias.
- NODE-GAM: Neural generalized additive model for interpretable deep learning
- Dropout as a regularizer of interaction effects
- Purifying interaction effects with the Functional ANOVA: An efficient algorithm for recovering identifiable additive models
- Accuracy, interpretability, and differential privacy via explainable boosting
- GAM changer: Editing generalized additive models with interactive visualization
- Neural additive models: Interpretable machine learning with neural nets
- How interpretable and trustworthy are GAMs?
- Extracting clinician’s goals by What-if interpretable modeling
- Automated interpretable discovery of heterogeneous treatment effectiveness: A Covid-19 case study
Exploring open questions for safety, security, and privacy in AI
When considering how to shape appropriate trust in AI systems, there are many open questions about safety, security, and privacy. How do we stay a step ahead of attackers intent on subverting an AI system or harvesting its proprietary information? How can we avoid a system’s potential for inferring spurious correlations?
With autonomous systems, it is important to acknowledge that no system operating in the real world will ever be complete. It’s impossible to train a system for the many unknowns of the real world. Unintended outcomes can range from annoying to dangerous. For example, a self-driving car may splash pedestrians on a rainy day or erratically swerve to localize itself for lane-keeping. An overview of emerging research in avoiding negative side effects due to AI systems’ incomplete knowledge points to the importance of giving users the means to avoid or mitigate the undesired effects of an AI system’s outputs as essential to how the technology will be viewed or used.
When dealing with data about people and our physical world, privacy considerations take a vast leap in complexity. For example, it’s possible for a malicious actor to isolate and re-identify individuals from information in large, anonymized datasets or from their interactions with online apps when using personal devices. Developments in privacy-preserving techniques face challenges in usability and adoption because of the deeply theoretical nature of concepts like homomorphic encryption, secure multiparty computation, and differential privacy. Exploring the design and governance challenges of privacy-preserving computation, interviews with builders of AI systems, policymakers, and industry leaders reveal confidence that the technology is useful, but the challenge is to bridge the gap from theory to practice in real-world applications. Engaging the human-computer interaction community will be a critical component.
Reliability and safety
- Avoiding negative side effects due to incomplete knowledge of AI systems
- Split-treatment analysis to rank heterogeneous causal effects for prospective interventions
- Out-of-distribution prediction with invariant risk minimization: The limitation and an effective fix
- Causal transfer random forest: Combining logged data and randomized experiments for robust prediction
- Understanding failures of deep networks via robust feature extraction
- DoWhy: Addressing challenges in expressing and validating causal assumptions
Privacy and security
- Exploring design and governance challenges in the development of privacy-preserving computation
- Accuracy, interpretability, and differential privacy via explainable boosting
- Labeled PSI from homomorphic encryption with reduced computation and communication
- EVA improved: Compiler and extension library for CKKS
- Aggregate measurement via oblivious shuffling
A call to personal action
AI is not an end-all, be-all solution; it’s a powerful, albeit fallible, set of technologies. The challenge is to maximize the benefits of AI while anticipating and minimizing potential harms.
Admittedly, the goal of appropriate trust is challenging. Developing measurement tools for assessing a world in which algorithms are shaping our behaviors, exposing how systems arrive at decisions, planning for AI failures, and engaging the people on the receiving end of AI systems are important pieces. But what we do know is change can happen today with each one of us as we pause and reflect on our work, asking: what could go wrong, and what can I do to prevent it?