Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Machine Learning for fair decisions

July 17, 2018 | By Miro Dudík, Principal Researcher; John Langford, Principal Researcher; Hanna Wallach, Principal Researcher; Alekh Agarwal, Senior Researcher

Over the past decade, machine learning systems have begun to play a key role in many high-stakes decisions: Who is interviewed for a job? Who is approved for a bank loan? Who receives parole? Who is admitted to a school?

Human decision makers are susceptible to many forms of prejudice and bias, such as those rooted in gender and racial stereotypes. One might hope that machines would be able to make decisions more fairly than humans. However, news stories and numerous research studies have found that machine learning systems can inadvertently discriminate against minorities, historically disadvantaged populations and other groups.

In essence, this is because machine learning systems are trained to replicate decisions present in the data with which they are trained and these decisions reflect society’s historical biases.

Naturally, researchers want to mitigate these biases, but there are several challenges. For example, there are many different definitions of fairness. Should the same number of men and women be interviewed for a job or should the number of men and women interviewed reflect the proportions of men and women in the applicant pool? What about nonbinary applicants? Should machine learning systems even be used in hiring contexts? Answers to questions like these are non-trivial and often depend on societal context. On top of that, re-engineering existing machine learning pipelines to incorporate fairness considerations can be hard. How can you train a boosted-decision-tree classifier to respect specific gender proportions? What about other fairness definitions? What about training a two-layer neural network? Or a residual network? Each of these questions can require many months of research and engineering.

Our work, outlined in a paper titled, “A Reductions Approach to Fair Classification,” presented this month at the 35th International Conference on Machine Learning (ICML 2018) in Stockholm, Sweden, focuses on some of these challenges, providing a provably and empirically sound method for turning any common classifier into a “fair” classifier according to any of a wide range of fairness definitions.

To understand our method, consider the process of choosing applicants to interview for a job where it is desirable to have an interview pool that is balanced with respect to gender and race—a fairness definition known as demographic parity. Our method can turn a classifier that predicts who should be interviewed based on previous (potentially biased) hiring decisions into a classifier that predicts who should be interviewed while also respecting demographic parity (or another fairness definition).

Such techniques are called “reductions” because they reduce the problem we wish to solve into a different problem—typically a more standard problem for which many algorithms already exist. Our fair learning reduction is, in fact, a special case of a more general reduction for imposing constraints on classifiers.

Our method operates as a game between the classification algorithm and a “fairness enforcer.” The classification algorithm tries to come up with the most accurate classification rule on possibly-reweighted data, while the fairness enforcer checks the chosen fairness definition. The training data is reweighted based on the output of the fairness enforcer and passed back to the classification algorithm. For instance, applicants of a certain gender or race might be upweighted or downweighted, so that the classification algorithm is better able to find a classification rule that is fair with respect to the desired gender or racial proportions. Eventually, this process yields a classification rule that is fair, according to the fairness definition. Moreover, it is also the most accurate among all fair rules, provided the classification algorithm consistently tries to minimize the error on the reweighted data.

Our algorithm (exp. grad. reduction) matches or outperforms baseline approaches across many data sets, classifier types and fairness measures.

In practice, this process takes about five iterations to yield classification rules that are as good as or better than many previous methods tailored to specific fairness definitions. Our method works with many different definitions of fairness and only needs access to protected attributes, such as gender or race, during training, not when the classifier is deployed in an application. Because our method works as a “wrapper” around any existing classifier, it is easy to incorporate into existing machine learning systems.

Our results contribute to the ongoing conversation about developing “fair” machine learning systems. However, there are many important questions that remain to be addressed. For example, we might not have access to protected attributes at training time or might not want to commit to a single definition of fairness. More fundamentally, in order to effectively mitigate impacts of historical bias, it might be insufficient to impose a quantitative definition of fairness; it might be necessary to systematically collect additional data and monitor the effects of decisions over time, or even avoid the use of machine learning systems in some domains.

Related Links:

Up Next

Artificial intelligence

What’s in a name? Using Bias to Fight Bias in Occupational Classification

Bias in AI is a big problem. In particular, AI can compound the effects of existing societal biases: in a recruiting tool, if more men than women are software engineers, AI is likely to use that data to identify job applicants and overscreen for men, creating a vicious circle of bias. Indeed, Amazon recently scrapped […]

Alexey Romanov

PhD Student, University of Massachusetts Lowell

Artificial intelligence

Microsoft Research to present latest findings on fairness in socio-technical systems at FAT* 2019

Researchers from Microsoft Research will present a series of studies and insights relating to fairness in machine learning systems and allocations at the FAT* Conference—the new flagship conference for fairness, accountability, and transparency in socio-technical systems—to be held from January 29–31 in Atlanta, Georgia. Presented across four papers and covering a broad spectrum of domains, […]

Microsoft blog editor

chris bishop

Artificial intelligence

Machine learning and the learning machine with Dr. Christopher Bishop

Episode 52, November 28, 2018 - Dr. Christopher Bishop talks about the past, present and future of AI research, explains the No Free Lunch Theorem, talks about the modern view of machine learning (or how he learned to stop worrying and love uncertainty), and tells how the real excitement in the next few years will be the growth in our ability to create new technologies not by programming machines but by teaching them to learn.

Microsoft blog editor