Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Machine Learning for fair decisions

July 17, 2018 | By Miro Dudík, Principal Researcher; John Langford, Principal Researcher; Hanna Wallach, Principal Researcher; Alekh Agarwal, Senior Researcher

Over the past decade, machine learning systems have begun to play a key role in many high-stakes decisions: Who is interviewed for a job? Who is approved for a bank loan? Who receives parole? Who is admitted to a school?

Human decision makers are susceptible to many forms of prejudice and bias, such as those rooted in gender and racial stereotypes. One might hope that machines would be able to make decisions more fairly than humans. However, news stories and numerous research studies have found that machine learning systems can inadvertently discriminate against minorities, historically disadvantaged populations and other groups.

In essence, this is because machine learning systems are trained to replicate decisions present in the data with which they are trained and these decisions reflect society’s historical biases.

Naturally, researchers want to mitigate these biases, but there are several challenges. For example, there are many different definitions of fairness. Should the same number of men and women be interviewed for a job or should the number of men and women interviewed reflect the proportions of men and women in the applicant pool? What about nonbinary applicants? Should machine learning systems even be used in hiring contexts? Answers to questions like these are non-trivial and often depend on societal context. On top of that, re-engineering existing machine learning pipelines to incorporate fairness considerations can be hard. How can you train a boosted-decision-tree classifier to respect specific gender proportions? What about other fairness definitions? What about training a two-layer neural network? Or a residual network? Each of these questions can require many months of research and engineering.

Our work, outlined in a paper titled, “A Reductions Approach to Fair Classification,” presented this month at the 35th International Conference on Machine Learning (ICML 2018) in Stockholm, Sweden, focuses on some of these challenges, providing a provably and empirically sound method for turning any common classifier into a “fair” classifier according to any of a wide range of fairness definitions.

To understand our method, consider the process of choosing applicants to interview for a job where it is desirable to have an interview pool that is balanced with respect to gender and race—a fairness definition known as demographic parity. Our method can turn a classifier that predicts who should be interviewed based on previous (potentially biased) hiring decisions into a classifier that predicts who should be interviewed while also respecting demographic parity (or another fairness definition).

Such techniques are called “reductions” because they reduce the problem we wish to solve into a different problem—typically a more standard problem for which many algorithms already exist. Our fair learning reduction is, in fact, a special case of a more general reduction for imposing constraints on classifiers.

Our method operates as a game between the classification algorithm and a “fairness enforcer.” The classification algorithm tries to come up with the most accurate classification rule on possibly-reweighted data, while the fairness enforcer checks the chosen fairness definition. The training data is reweighted based on the output of the fairness enforcer and passed back to the classification algorithm. For instance, applicants of a certain gender or race might be upweighted or downweighted, so that the classification algorithm is better able to find a classification rule that is fair with respect to the desired gender or racial proportions. Eventually, this process yields a classification rule that is fair, according to the fairness definition. Moreover, it is also the most accurate among all fair rules, provided the classification algorithm consistently tries to minimize the error on the reweighted data.

Our algorithm (exp. grad. reduction) matches or outperforms baseline approaches across many data sets, classifier types and fairness measures.

In practice, this process takes about five iterations to yield classification rules that are as good as or better than many previous methods tailored to specific fairness definitions. Our method works with many different definitions of fairness and only needs access to protected attributes, such as gender or race, during training, not when the classifier is deployed in an application. Because our method works as a “wrapper” around any existing classifier, it is easy to incorporate into existing machine learning systems.

Our results contribute to the ongoing conversation about developing “fair” machine learning systems. However, there are many important questions that remain to be addressed. For example, we might not have access to protected attributes at training time or might not want to commit to a single definition of fairness. More fundamentally, in order to effectively mitigate impacts of historical bias, it might be insufficient to impose a quantitative definition of fairness; it might be necessary to systematically collect additional data and monitor the effects of decisions over time, or even avoid the use of machine learning systems in some domains.

Related Links:

Up Next

Artificial intelligence

Deep Learning Indaba 2018: Strengthening African machine learning

At the 30th conference on Neural Information Processing in 2016, one of the world’s foremost gatherings on machine learning, there was not a single accepted paper from a researcher at an African institution. In fact, for the last decade, the entire African continent has been absent from the contemporary machine learning landscape. The following year, […]

Tempest van Schaik

Software Engineer, AI & Data Science, Commercial Software Engineering

Artificial intelligence

Teaching AI to make decisions and communicate

The term ‘Artificial Intelligence’ may have been coined way back in the 1950s, but we have yet to see the types of machines described in books and films. In our daily lives and workplaces, we aspire to having machines that can help us get things done and make the most of our time. For an […]

Microsoft blog editor

Artificial intelligence

Substance, not hype, powers AI excitement at premier machine learning conference

By Christopher Bishop, Distinguished Scientist and Director of Microsoft Research Cambridge Lab This month, I will attend the Conference and Workshop on Neural Information Processing Systems (NIPS), the premier gathering in the machine learning field. I’ve participated in this conference most years since it began in 1987 and I’m looking forward once again to catching […]

Microsoft blog editor