{"id":560082,"date":"2019-01-16T10:00:31","date_gmt":"2019-01-16T18:00:31","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=560082"},"modified":"2019-01-15T11:43:54","modified_gmt":"2019-01-15T19:43:54","slug":"microsoft-ability-initiative-a-collaborative-quest-to-innovate-in-image-captioning-for-people-who-are-blind-or-with-low-vision","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-ability-initiative-a-collaborative-quest-to-innovate-in-image-captioning-for-people-who-are-blind-or-with-low-vision\/","title":{"rendered":"Microsoft Ability Initiative: A collaborative quest to innovate in image captioning for people who are blind or with low vision"},"content":{"rendered":"<div id=\"attachment_560160\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-560160\" class=\"wp-image-560160 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-1024x576.png\" alt=\"From left to right: Danna Gurari, University of Texas; Ed Cutrell, Microsoft Research; Roy Zimmermann, Microsoft Research; Meredith Ringel Morris, Microsoft Research; Ken Fleischmann, University of Texas; Neel Joshi, Microsoft Research\" width=\"1024\" height=\"576\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-560160\" class=\"wp-caption-text\">From left to right: Danna Gurari, University of Texas; Ed Cutrell, Microsoft Research; Roy Zimmermann, Microsoft Research; Meredith Ringel Morris, Microsoft Research; Ken Fleischmann, University of Texas; Neel Joshi, Microsoft Research<\/p><\/div>\n<p>Microsoft is committed to pushing the boundaries of technology to improve and positively influence all parts of society. Recent advances in deep learning and related AI techniques have resulted in significant strides in automated image captioning. However, current image captioning systems are not well-aligned with the needs of a community that can benefit greatly from them: people who are blind or with low vision.<\/p>\n<p>We recently completed a competitive process to find an academic research team to work with on changing that. We\u2019re excited to partner with The University of Texas at Austin for our new Microsoft Ability Initiative. This companywide initiative aims to create a public dataset that ultimately can be used to advance the state of the art in AI systems for automated image captioning. We recently spent two days with the research team in Austin to kick off this exciting new collaboration.<\/p>\n<p>Microsoft researchers involved in this effort have specialized experience in accessible technologies, human-centric AI systems, and computer vision. These researchers\u2019 efforts are complemented by colleagues in other divisions of the company, including the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/ai4prod.microsoftcrmportals.com\/AIAcccessibilityGrantApplication\/\">AI for Accessibility program<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which helps fund the initiative, and Microsoft 365 accessibility. The Microsoft Ability Initiative is one of an increasing number of initiatives at Microsoft in which researchers and product developers are coming together in a new, cross-company push to spur innovative and exciting new research and development in the area of accessible technologies.<\/p>\n<p>\u201cWe are excited about this new initiative,\u201d said Wendy Chisholm, Principal Program Manager with the AI for Accessibility program at Microsoft. \u201cThe goal of creating public data resources that can accelerate innovations with AI that empower people who are blind or with low vision is a fantastic example of the kind of impact Microsoft hopes to have through its <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/aka.ms\/grant\">AI for Accessibility program<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u201d<\/p>\n<p>UT Austin stood out last year from a select number of universities with specialized experience invited to participate in the competitive process to identify an academic partner for the initiative. Principal investigator Professor Danna Gurari and Professor Kenneth R. Fleischmann are leading the team at UT Austin, which also includes several graduate students.<\/p>\n<p>Professor Gurari has a previous record of success in creating public datasets to advance the state of the art in AI and accessibility, having co-founded the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/vizwiz.org\/workshop\/\">VizWiz Grand Challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The UT Austin team, which we\u2019ll collaborate with over a period of 18 months, plans to take a user-centered approach to the problem, including working with people who are blind or with low vision to better understand their expectations of AI captioning tools. The team also plans to launch community challenges to engage a broad swath of researchers and developers to build these next-generation tools.<\/p>\n<p>\u201cI hope to build a community that links the diversity of researchers and practitioners with a shared interest in developing accessible methods in order to accelerate the conversion of cutting-edge research into market products that assist people who are blind or with low vision in their daily lives,\u201d said Gurari.<\/p>\n<div id=\"attachment_560139\" style=\"width: 778px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-560139\" class=\"wp-image-560139 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/harrypotter-768x1024.jpg\" alt=\"a grandfather, mother, and two children\u2014and they are dressed in Harry Potter costumes.\" width=\"768\" height=\"1024\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/harrypotter-768x1024.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/harrypotter-225x300.jpg 225w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/harrypotter.jpg 1200w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><p id=\"caption-attachment-560139\" class=\"wp-caption-text\">A state-of-the-art vision-to-language system\u00a0labeled this image as \u201ca group of people posing for the camera.\u201d While not incorrect, the caption excludes many of the details that are compelling about the image, such as the fact it comprises a family\u2014a grandfather, mother, and two children\u2014and they are dressed in Harry Potter costumes. Training AI systems to provide more detailed captions that can offer a richer understanding of images for people who are blind or with low vision is an important goal of this new research initiative.<\/p><\/div>\n<p>This collaboration with UT Austin builds upon prior Microsoft research that has identified a need for new approaches at the intersection of computer vision and accessibility. Such work includes studies on <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/captions_chi2017.pdf\">how end-users who are blind interpret the output of AI image labeling systems<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/scalable_social_alttext.pdf\">the types of detail missing from automated image descriptions<\/a>. We\u2019ve also built a prototype exploring <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/01\/images_chi_two_appendices.pdf\">new techniques for interacting with image captions<\/a> that takes advantage of more detailed and structured caption content future AI systems may provide. Our prior research has identified many key challenges in this realm, and we\u2019re looking forward to working with UT Austin to make strides toward actionable solutions. Our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/\">Cognitive Services<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-us\/\">Azure cloud computing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> resources provide a technical foundation that will support the joint research effort.<\/p>\n<p>Professor Gurari noted that the initiative will not only advance the state of the art of vision-to-language technology, continuing the progress Microsoft has made with such tools and resources as the <a href=\"https:\/\/www.microsoft.com\/en-us\/seeing-ai\">Seeing AI mobile phone application<\/a> and the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/cocodataset.org\/#home\">Microsoft Common Objects in COntext (MS COCO) dataset<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, but it will also be a teaching opportunity for students at UT Austin.<\/p>\n<p>\u201cI love to see the excitement in so many of my students when they realize that they can use their skills to make a difference in the world, especially for people who are blind or with low vision,\u201d she said.<\/p>\n<p>We came away from our meetings at The University of Texas at Austin even more energized about the potential for this initiative to have real impact in the lives of millions of people around the world, and we couldn\u2019t be more excited. We expect at the end of this joint effort that the broader research community will leverage the new dataset to jump-start yet another wave of innovative research that will lead to new technologies for people who are blind or with low vision.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft is committed to pushing the boundaries of technology to improve and positively influence all parts of society. Recent advances in deep learning and related AI techniques have resulted in significant strides in automated image captioning. However, current image captioning systems are not well-aligned with the needs of a community that can benefit greatly from [&hellip;]<\/p>\n","protected":false},"author":37074,"featured_media":560160,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Meredith Ringel Morris","user_id":"32884"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194460],"tags":[],"research-area":[13556,13545,13555],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-560082","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-search-and-information-retrieval","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144928,283244],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788.png\" class=\"img-object-cover\" alt=\"microsoft ability team stands in front of sign at university of texas at austin\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/NewPartnership_AbilityTeam_AI_Site_12_2018_1400x788-343x193.png 343w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"Meredith Ringel Morris","formattedDate":"January 16, 2019","formattedExcerpt":"Microsoft is committed to pushing the boundaries of technology to improve and positively influence all parts of society. Recent advances in deep learning and related AI techniques have resulted in significant strides in automated image captioning. However, current image captioning systems are not well-aligned with&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/560082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37074"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=560082"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/560082\/revisions"}],"predecessor-version":[{"id":560172,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/560082\/revisions\/560172"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/560160"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=560082"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=560082"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=560082"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=560082"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=560082"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=560082"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=560082"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=560082"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=560082"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=560082"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=560082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}