Abstract

Research advancements allow computational systems to automatically caption social media images. Often, these captions are evaluated with sighted humans using the image as a reference. Here, we explore how blind and visually impaired people experience these captions in two studies about social media images. Using a contextual inquiry approach (n=6 blind/visually impaired), we found that blind people place a lot of trust in automatically generated captions, filling in details to resolve differences between an image’s context and an incongruent caption. We built on this in-person study with a second, larger online experiment (n=100 blind/visually impaired) to investigate the role of phrasing in encouraging trust or skepticism in captions. We found that captions emphasizing the probability of error, rather than correctness, encouraged people to attribute incongruence to an incorrect caption, rather than missing details. Where existing research has focused on encouraging trust in intelligent systems, we conclude by challenging this assumption and consider the benefits of encouraging appropriate skepticism.