Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Microsoft Ability Initiative: A collaborative quest to innovate in image captioning for people who are blind or with low vision

January 16, 2019 | By Meredith Ringel Morris, Sr. Principal Researcher & Research Manager

From left to right: Danna Gurari, University of Texas; Ed Cutrell, Microsoft Research; Roy Zimmermann, Microsoft Research; Meredith Ringel Morris, Microsoft Research; Ken Fleischmann, University of Texas; Neel Joshi, Microsoft Research

From left to right: Danna Gurari, University of Texas; Ed Cutrell, Microsoft Research; Roy Zimmermann, Microsoft Research; Meredith Ringel Morris, Microsoft Research; Ken Fleischmann, University of Texas; Neel Joshi, Microsoft Research

Microsoft is committed to pushing the boundaries of technology to improve and positively influence all parts of society. Recent advances in deep learning and related AI techniques have resulted in significant strides in automated image captioning. However, current image captioning systems are not well-aligned with the needs of a community that can benefit greatly from them: people who are blind or with low vision.

We recently completed a competitive process to find an academic research team to work with on changing that. We’re excited to partner with The University of Texas at Austin for our new Microsoft Ability Initiative. This companywide initiative aims to create a public dataset that ultimately can be used to advance the state of the art in AI systems for automated image captioning. We recently spent two days with the research team in Austin to kick off this exciting new collaboration.

Microsoft researchers involved in this effort have specialized experience in accessible technologies, human-centric AI systems, and computer vision. These researchers’ efforts are complemented by colleagues in other divisions of the company, including the AI for Accessibility program, which helps fund the initiative, and Microsoft 365 accessibility. The Microsoft Ability Initiative is one of an increasing number of initiatives at Microsoft in which researchers and product developers are coming together in a new, cross-company push to spur innovative and exciting new research and development in the area of accessible technologies.

“We are excited about this new initiative,” said Wendy Chisholm, Principal Program Manager with the AI for Accessibility program at Microsoft. “The goal of creating public data resources that can accelerate innovations with AI that empower people who are blind or with low vision is a fantastic example of the kind of impact Microsoft hopes to have through its AI for Accessibility program.”

UT Austin stood out last year from a select number of universities with specialized experience invited to participate in the competitive process to identify an academic partner for the initiative. Principal investigator Professor Danna Gurari and Professor Kenneth R. Fleischmann are leading the team at UT Austin, which also includes several graduate students.

Professor Gurari has a previous record of success in creating public datasets to advance the state of the art in AI and accessibility, having co-founded the VizWiz Grand Challenge. The UT Austin team, which we’ll collaborate with over a period of 18 months, plans to take a user-centered approach to the problem, including working with people who are blind or with low vision to better understand their expectations of AI captioning tools. The team also plans to launch community challenges to engage a broad swath of researchers and developers to build these next-generation tools.

“I hope to build a community that links the diversity of researchers and practitioners with a shared interest in developing accessible methods in order to accelerate the conversion of cutting-edge research into market products that assist people who are blind or with low vision in their daily lives,” said Gurari.

a grandfather, mother, and two children—and they are dressed in Harry Potter costumes.

A state-of-the-art vision-to-language system labeled this image as “a group of people posing for the camera.” While not incorrect, the caption excludes many of the details that are compelling about the image, such as the fact it comprises a family—a grandfather, mother, and two children—and they are dressed in Harry Potter costumes. Training AI systems to provide more detailed captions that can offer a richer understanding of images for people who are blind or with low vision is an important goal of this new research initiative.

This collaboration with UT Austin builds upon prior Microsoft research that has identified a need for new approaches at the intersection of computer vision and accessibility. Such work includes studies on how end-users who are blind interpret the output of AI image labeling systems and the types of detail missing from automated image descriptions. We’ve also built a prototype exploring new techniques for interacting with image captions that takes advantage of more detailed and structured caption content future AI systems may provide. Our prior research has identified many key challenges in this realm, and we’re looking forward to working with UT Austin to make strides toward actionable solutions. Our Cognitive Services and Azure cloud computing resources provide a technical foundation that will support the joint research effort.

Professor Gurari noted that the initiative will not only advance the state of the art of vision-to-language technology, continuing the progress Microsoft has made with such tools and resources as the Seeing AI mobile phone application and the Microsoft Common Objects in COntext (MS COCO) dataset, but it will also be a teaching opportunity for students at UT Austin.

“I love to see the excitement in so many of my students when they realize that they can use their skills to make a difference in the world, especially for people who are blind or with low vision,” she said.

We came away from our meetings at The University of Texas at Austin even more energized about the potential for this initiative to have real impact in the lives of millions of people around the world, and we couldn’t be more excited. We expect at the end of this joint effort that the broader research community will leverage the new dataset to jump-start yet another wave of innovative research that will lead to new technologies for people who are blind or with low vision.

Up Next

hands holding holographic brain node image

Artificial intelligence, Human language technologies

Guidelines for human-AI interaction design

The increasing availability and accuracy of AI has stimulated uses of AI technologies in mainstream user-facing applications and services. Along with opportunities for infusing valuable AI services in a wide range of products come challenges and questions about best practices and guidelines for human-centered design. A dedicated team of Microsoft researchers addressed this need by […]

Saleema Amershi

Senior Researcher

Fernando Diaz

Artificial intelligence, Search and information retrieval

Microsoft Research Montreal welcomes Fernando Diaz, Principal Researcher and lead of the new Montreal FATE Research Group

Microsoft Research Montreal further bolsters its research force this month, welcoming Fernando Diaz to the Montreal FATE (Fairness, Accountability, Transparency and Ethics in AI) research group as Principal Researcher. Diaz, whose research area is the design of information access systems, including search engines, music recommendation services and crisis response platforms is particularly interested in understanding […]

Microsoft blog editor

Artificial intelligence, Human-computer interaction, Medical, health and genomics

HEARING IS BELIEVING – Researchers’ innovation provides a richer web-browsing experience for people who are blind

Imagine for a moment that you are blind and are navigating the web using a screen reader to hear websites rather than see them. Imagine that the article you have navigated to includes images. To understand the content and significance, you are relying on your screen reader to narrate the alt text associated with each […]

Microsoft blog editor