“I Can’t Reply with That”: Characterizing Problematic Email Reply Suggestions
In email interfaces, providing users with reply suggestions may simplify or accelerate correspondence. While the “success'” of such systems is typically quantified using the number of suggestions selected by users, this ignores the impact of social context, which can change how suggestions are perceived. To address this, we developed a mixed-methods framework involving qualitative interviews and crowdsourced experiments to characterize problematic email reply suggestions. Our interviews revealed issues with over-positive, dissonant, cultural, and gender-assuming replies, as well as contextual politeness. In our experiments, crowdworkers assessed email scenarios that we generated and systematically controlled, showing that contextual factors like social ties and the presence of salutations impacts users’ perceptions of email correspondence. These assessments created a novel dataset of human-authored corrections for problematic email replies. Our study highlights the social complexity of providing suggestions for email correspondence, raising issues that may apply to all social messaging systems.
Auditing natural language processing (NLP) systems for computational harms remains an elusive goal. Doing so, however, is critical as there is a proliferation of language technologies (and applications) that are enabled by increasingly powerful natural language generation and representation models. Computational harms occur not only due to what content is being produced by people, but also due to how content is being embedded, represented, and generated by large-scale and sophisticated language models. This webinar will cover challenges with locating and measuring potential harms that language technologies—and the data they ingest or generate—might surface, exacerbate, or cause. Such harms can range from more overt issues, like surfacing offensive speech or reinforcing stereotypes, to more subtle issues, like nudging users toward undesirable patterns of behavior or triggering memories of traumatic events. Join Microsoft researchers Su Lin Blodgett and Alexandra Olteanu, from the FATE Group at Microsoft Research Montréal, to examine pitfalls in some state-of-the-art approaches to measuring computational harms in language technologies. For such measurements of harms to be effective, it is important to clearly articulate both: 1) the construct to be measured and 2) how the measurements operationalize that construct. The webinar will also overview possible approaches practitioners could take to proactively identify issues that might not be on their radar, and thus effectively track and measure a wider range of issues. Together, you'll explore: Possible pitfalls when measuring computational harms in language technologies Challenges to identifying what harms we should be measuring Steps toward anticipating computational harms Resource list: A Critical Survey of “Bias” in NLP (Publication) When Are Search Completion Suggestions Problematic? (Publication) Social Data (Publication) Characterizing Problematic Email Reply Suggestions (Publication) Overcoming Failures of Imagination in AI Infused System Development and Deployment (Publication) Defining Bias with Su Lin Blodgett (Podcast) Language, Power and NLP (Podcast) Su Lin Blodgett (researcher profile) Alexandra Olteanu (researcher profile) *This on-demand webinar features a previously recorded Q&A session and open captioning. Explore more Microsoft Research webinars: https://aka.ms/msrwebinars