Microsoft Research to present latest findings on fairness in socio-technical systems at FAT* 2019


Researchers from Microsoft Research will present a series of studies and insights relating to fairness in machine learning systems and allocations at the FAT* Conference—the new flagship conference for fairness, accountability, and transparency in socio-technical systems—to be held from January 29–31 in Atlanta, Georgia.

Presented across four papers and covering a broad spectrum of domains, the research is a reflection of the resolute commitment Microsoft Research has made to fairness in automated systems that shape human experience as they become more rapidly adopted in a growing number of contexts in society.

Bias in bios


Microsoft Research Summit On-Demand

October 19–21, 2021
View over 190 recorded sessions from the 2021 Microsoft Research Summit, where researchers and engineers across Microsoft, and our colleagues in academia, industry, and government will come together to discuss cutting-edge work that is pushing the limits of science and technology.

In “Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting,” Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Kalai look closely at presumptions and realities regarding gender bias in occupation classification, shedding light on risks inherent in using machine learning in high-stakes settings, as well as on the difficulties that arise when trying to promote fairness by scrubbing explicit gender indicators, such as first names and pronouns, from online bios.

Online recruiting and automated hiring are an enormously impactful societal domain in which the use of machine learning is increasingly popular—and in which unfair practices can lead to unexpected and undesirable consequences. Maintaining an online professional presence has become indispensable for people’s careers, and the data making up that presence often ends up in automated decision-making systems that advertise open positions and recruit candidates for jobs and other professional opportunities. To execute these tasks, a system must be able to accurately assess people’s current occupations, skills, interests, and, more subjective but no less real, their potential.

“Automated decision-making systems are playing an increasingly active role in shaping our lives—and their predictions today even go as far as to affect the world we will live in tomorrow,” said Hanna Wallach, Principal Researcher at Microsoft Research New York City. “For example, machine learning is becoming increasingly popular in online recruiting and automated hiring contexts. Many of us have had jobs or other professional opportunities automatically suggested to us based on our online professional presences, and we were curious how much these recommendations might be affected by something like our genders.”

The researchers created a new dataset of hundreds of thousands of online biographies and were able to show that occupation classifiers exhibit significant true positive rate (TPR) gender gaps when using three different semantic representations—bag-of-words, word embeddings, and deep recurrent neural networks. They were also able to show that the correlation between these TPR gender gaps and existing gender imbalances in occupations may compound the imbalances. They performed simulations demonstrating that imbalances are especially problematic if people repeatedly encounter occupation classifiers because they cause underrepresented genders to become even further underrepresented.

The researchers observed that because biographies are typically written in the third person by their subjects (or people familiar with their subjects) and because pronouns are often gendered in English, they were able to extract subjects’ (likely) self-identified binary genders from the biographies. But they took pains to point out that a binary model of gender is a simplification that fails to capture important aspects of gender and erases people who do not fit within its assumptions.

“We found that when explicit gender indicators—such as first names and pronouns—are present, machine learning classifiers trained to predict people’s occupations do much worse at correctly predicting the occupations of women in stereotypically male professions and men in stereotypically female professions,” said Wallach.

Even when such gender indicators are scrubbed, these performance differences, though less pronounced, remain. In addition to the realization that scrubbing explicit gender indicators isn’t enough to remove gender bias from occupation classifiers, the researchers discovered that even in the absence of such indicators, TPR gender gaps are correlated with existing gender imbalances in occupations. That is, occupation classifiers may in fact exacerbate existing gender imbalances.

“These findings have very real implications in that they suggest that machine learning classifiers trained to predict people’s occupations may compound or worsen existing gender imbalances in some occupations,” said Wallach.

The findings also suggested that there are differences between men’s and women’s online biographies other than explicit gender indicators, perhaps because of the varying ways that men and women present themselves or their having different specializations within various occupations.

“Our paper highlights both the risks of using machine learning in a high-stakes setting and the difficulties inherent in trying to promote fairness by ‘scrubbing’ sensitive attributes, such as gender,” Wallach said.

Although the researchers focused on gender bias, they noted that other biases, such as those involving race or socioeconomic status, may also be present in occupation classification or in other tasks related to online recruiting and automated hiring.

Sending signals

In a world in which personal data drives more and more decision-making, both consequential and routine, there is a growing interest in the ways in which such data-driven decision-making has the potential to reinforce or amplify injustices. In “Access to Population-Level Signaling as a Source of Inequality,Nicole Immorlica, Katrina Ligett, and Juba Ziani examine the idea of fairness through an economic lens, finding that disparity in the data available to unbiased decision-makers—optimizing determinations to fit their specific needs—results in one population gaining a significant advantage over another.

The researchers studied access to population-level signaling as a source of bias in outcomes. Population-level strategic signalers can serve as advocates for owners of personal data by filtering or noising data in hopes of improving individuals’ prospects by making it more challenging for decision-makers to distinguish between high- and low-quality candidates. An example is high schools that, to increase the chances their students will be admitted to prestigious universities, inflate grades, refrain from releasing data on class rankings, and provide glowing recommendation letters for more than just the top students.

The sophistication of the signaling that a school might engage in—how strategic the school is in its data reporting versus how revealing it is (simply reporting the information it collects on its students directly to a university)—makes an enormous difference in outcomes. As expected, strategic schools with accurate information about their students have a significant advantage over revealing schools—and strategic schools get more of their students, including unqualified ones, admitted by a university.

“One of the many sources of unfairness is that disadvantaged groups often lack the ability to signal their collective quality to decision-makers, meaning each individual must prove their worth on their own merits,” said Principal Researcher Nicole Immorlica of Microsoft Research New England and New York City. “In comparison, members of advantaged groups are often lumped together, causing individuals to acquire the average quality of their peers in the mind of the decision-maker.”

The researchers go on to derive an optimal signaling scheme for a high school and demonstrate that disparities in ability to signal strategically can constitute a significant source of inequality. The researchers also examine the potential for standardized tests to ameliorate the problem, concluding it is limited in its ability to address strategic signaling inequities and may even exacerbate these inequities in some settings.

“By looking at fairness through an economic lens, we can uncover purely structural sources of unfairness that persist even when unbiased decision-makers act only to maximize their own benefit,” said Immorlica.

Strategic manipulation

In “The Disparate Effects of Strategic Manipulation,Lily Hu, Nicole Immorlica, and Jenn Wortman Vaughan show how the expanding realm of algorithmic decision-making can change the way that individuals present themselves to obtain an algorithm’s approval and how this can lead to increased social stratification.

“We study an aspect of algorithmic fairness that has received relatively little attention: the disparities that can arise from different populations’ differing abilities to strategically manipulate the way that they appear in order to be classified a certain way,” said Jenn Wortman Vaughan, Senior Researcher at Microsoft Research New York City. “Take the example of college admissions. Suppose that admissions decisions incorporate SAT scores as a feature. Knowing that SAT scores impact decisions will prompt students who have the means to do so to boost their scores, say by taking SAT prep courses.”

As Lily Hu, a research intern from Harvard and lead author on the paper, put it, “Classifiers don’t just evaluate their subjects, but can animate them, as well.” That is, the very existence of a classifier causes people to react. This becomes a problem when not every person has equal access to resources like test prep classes in the example of college admissions or interview coaching in the domain of automated hiring. Even when an algorithm draws on features that seem to reflect individual merit, these metrics can be skewed to favor those who are more readily able to alter their features.

The researchers believe their work highlights a likely consequence of the expansion of algorithmic decision-making in a world that is marked by deep social inequalities. They demonstrate that the design of classification systems can grant undue rewards to those who appear more meritorious under a particular conception of merit while justifying exclusions of those who have failed to meet those standards. These consequences serve to exacerbate existing inequalities.

“Our game theoretic analysis shows how the relative advantage of privileged groups can be perpetuated in settings like this and that this problem is not so easy to fix,” explained Wortman Vaughan. “For example, coming back to the college admissions example, we show that providing subsidies on SAT test prep courses to disadvantaged groups can have the counterintuitive effect of making those students worse off since it allows the bar for admissions to be set higher.”

“It is important to study the impacts of interventions in stylized models in order to illuminate the potential pitfalls,” added fellow researcher Nicole Immorlica.

Allocating the indivisible

Fair Allocation through Competitive Equilibrium from Generic Incomes,” by Moshe Babaioff, Noam Nisan, and Inbal Talgam-Cohen, examines an underexplored area of theory—that of notions of fairness as applied to the allocation of indivisible items among players possessing different entitlements in settings without money. Imagine a scenario in which there are two food banks catering to populations of different sizes with different needs and that the two food banks must divide between each other a donation of food items. What will constitute a “fair” allocation of the available items?

These scenarios arise frequently in the context of real-life allocation decisions, such as allocating donations to food banks, allocating courses to students, distributing shifts among workers, and even sharing computational resources across a university or company. The researchers sought to develop notions of fairness that apply to these types of settings and opted for an approach that would study fairness through the prism of competitive market equilibrium, even for cases in which entitlements differ.

Focusing on market equilibrium theory for the Fisher market model, the researchers developed new fairness notions through a classic connection to competitive equilibrium. The first notion generalizes to the case of unequal entitlements and indivisible goods the well-known procedure of dividing a cake fairly between two kids: The first kid cuts the cake, and the second picks a piece. The second notion ensures that when we cannot give both what they deserve, we at least give as much as possible to the one that got less than he should.

“Our paper shows that for allocation of goods, market equilibrium ensures some attractive fairness properties even when people have different entitlements for the goods,” said Moshe Babaioff, Senior Researcher at Microsoft Research. “And although such market equilibria might fail to exist for some entitlements, we show that this is a knife’s-edge phenomenon that disappears once entitlements are slightly perturbed.”

Don’t miss the tutorial

In addition to the papers previewed here, there are many other exciting happenings at FAT* 2019. On the first day of the conference, Microsoft Research attendees, along with researchers from Spotify and Carnegie Mellon University, will be giving a tutorial titled “Challenges of Incorporating Algorithmic Fairness into Industry Practice.” The tutorial draws on semi-structured interviews, a survey of machine learning practitioners, and the presenters’ own practical experiences to provide an overview of the organizational and technical challenges that occur when translating research on fairness into practice.

These efforts reflect Microsoft Research’s commitment to fairness, accountability, transparency, and ethics in AI and machine learning systems. The FATE research group at Microsoft studies the complex social implications of AI, machine learning, data science, large-scale experimentation, and increasing automation. A relatively new group, FATE is working on collaborative research projects that address these larger issues, including interpretability. FATE publishes across a variety of disciplines, including machine learning, information retrieval, sociology, algorithmic economics, political science, science and technology studies, and human-computer interaction.

We look forward to sharing our work at  FAT* 2019. Hope to see you there!