An app for visually impaired people that narrates the world around you
Technology can be such an enabler of good and such an enabler for people to shrink the world, for the world to come closer together, and for people to be able to achieve so much more than they ever could without it.
About Seeing AI
Designed for the blind and low vision community, this research project harnesses the power of AI to describe people, text, currency, color, and objects.
Seeing AI is a Microsoft research project that brings together the power of the cloud and AI to deliver an intelligent app designed to help you navigate your day. Point your phone’s camera, select a channel, and hear a description of what the AI has recognized around you.
With this intelligent camera app, just hold up your phone and hear information about the world around you. Seeing AI can speak short text as soon as it appears in front of the camera, provide audio guidance to capture a printed page, and recognizes and narrates the text along with its original formatting. The app can also scan barcodes with guided audio cues to identify products, recognize and describe people around you and their facial expressions, as well as describing scenes around you using the power of AI. An ongoing project, the latest new ability to be added to Seeing AI’s roster is identifying currency bills when paying with cash and describing images in other apps such as your photo gallery, mail, Twitter.
- Turns the visual world into an audible experience — with this intelligent camera app, just hold up your phone and hear information about the world around you.
- Recognize and locate the faces of people you’re with, as well as facial characteristics, approximate age, emotion, and more.
- Read text quickly — hear short snippets of text instantly and get audio guidance to capture full documents.
You’ve probably seen it—the Seeing AI video, which premiered during Satya Nadella’s Build 2016 keynote. The video features Saqib Shaikh, a Seeing AI developer who is visually impaired, using the app to show the power of technology and the potential impact it can have on someone’s life.
You may not know the story behind the project. Seeing AI, formally known as Deep Vision, is the result of personal experience, using advances in research, and bringing together the right people with the right experience at just the right time.
It all started with Anirudh Koul—a data scientist working with machine learning and natural language processing in Bing. In early 2014, Anirudh realized that his grandfather, who was gradually losing his vision with age, was unable to recognize him during Skype calls. Anirudh was also aware of the emerging trend in computer vision—image classification errors were decreasing at a rate of 50 percent year-over-year, meaning it was likely that it would catch up to human accuracy in the near future. A short mobile prototype, while promising, left much to be desired in accuracy. His idea to help navigate users who are blind to nearby objects would have to wait.
In just a year, two big breakthroughs changed everything. First, a team of Microsoft researchers developed vision-to-language technology that was recognized as the most humanlike in the world. Equally important, the best image classification system in the world built by another Microsoft Research team recorded a 3.57 percent error rate, making it more accurate than humans at recognizing objects in images. And just like that, the building blocks were ready.
Anirudh began recruiting people to join his project, dubbed Deep Vision, for the 2015 Hackathon. He started with the researchers who had developed the vision-to-language technology. He also scoured the Internet to find published accessibility experts within Microsoft. He was told “no” more than once because “the idea seemed too ambitious.” For everyone who said no, some key people said yes.
Every person who joined the Deep Vision Hackathon team brought something unique often unrelated to their day job. “We came from different places with different qualifications. Some were novices, some were experts, but we were all equal when we joined this project. I think that is the biggest example of One Microsoft you can find,” said Salman Gadit, an engineer with experience in building mobile applications and optical character recognizers with the ability to read crumpled business cards—a skill that also came in handy during the project.
The team spent a substantial amount of time understanding what was available for the visually impaired community, what met a need, and what fell short. They identified three scenarios that drove the solution: mapping indoor worlds in 3-D and navigating it with only a camera without GPS, Wi-Fi, or beacons; the ability to ask questions about text and objects in the physical world; and describing the surroundings.
“It is not that people who can’t see can’t do things and have to use technology. People who can’t see do a ton of things and are incredibly independent. Technology can make life a whole lot more fun and exciting and personal for them. So it’s not just about fixing a problem; it’s about helping enhance the experience for someone who cannot see,” said teammate Mary Bellard, who is also part of the CELA Accessibility team and spent six years at the American Foundation for the Blind in New York.
The strategy worked, and the team won several awards at the 2015 Hackathon, competing with over 13,000 participants worldwide. Four members of the team eventually got full time funding in 2016, converting their passion project into their day jobs.
The demo at Build 2016 was a great moment for the teams of Microsoft developers and others who helped create the prototype. Among them was a group you might not expect: College students who served as interns. The hackathon team brought their initial Deep Vision prototype to the Garage Internship program at Microsoft Vancouver, where interns jumped on the chance to help build out the next version of the app now called Seeing AI. That included helping servers communicate with Microsoft Cognitive Services, a critical piece of the research project that, for example, reads words aloud into an earpiece and also explains images or surroundings.
Shweta Sharma, then a senior at McMaster University in Ontario, now a full time Microsoft employee, found the experience especially valuable. “I was inspired by Saqib Shaikh, the Microsoft software developer, blind from an early age, who was one of our sponsors for Seeing AI,” she says. “Having an immense amount of responsibility and ownership over our project is not something you normally experience in an internship.”
The Seeing AI app for iOS was broadly released on July 12, 2017 and has already assisted users with over 3 million tasks by the end of the year. With a large fan following who report using it for many ‘first time in life’ scenarios, it has also been honored by several awards, including the prestigious Helen Keller Achievement Award from the American Foundation for the Blind.
It has been a long but rewarding journey for a team that knows what can happen when the right people with the right experience come together at the right time.
Top Row – left to right: Nathan Lam, Christiano Bianchet, Elias Haroun,
Bottom Row – left to right: Sara Kiani, Karen Lai, Shweta Sharma, Wendy Lu, Juan Henao, Irene Chen, Coach Reza Jooyandeh
Not Pictured- Microsoft employees: Stephane Morichere-Matte, Harleen Thind, Victor Tran
The members of the Deep Vision Hackathon team are (top from left to right, from Silicon Valley) Sherlock Huang, Salman Gadit, Antony Deepak Thomas, Anirudh Koul, Meher Kasam, Serge-Eric Tremblay, Eren Song, (bottom from left to right, from Redmond/Bellevue) Wes Sularz, Mary Bellard, Anne Taylor, Abhinav Shrivastava, Margaret Mitchell, Ross Girshick, Kartik Sawhney, Ishan Misra, Gaurang Prajapati; and (not pictured, from London) Saqib Shaikh.
Updated with currency and color recognition, Seeing AI is available in 35 countries
Microsoft Accessibility Blog
American Foundation for the Blind Announces 2018 Helen Keller Achievement Award Winners
Microsoft is being honored for its significant strides in developing inclusive technologies to empower people with disabilities. Examples include the launch of the Seeing AI app, which narrates the world for people who are blind or have sight loss. Eye Control on Windows 10 is a new input that allows individuals with severe mobility issues, such as those that stem from Lou Gehrig’s disease, to communicate and use a computer with only the movement of their eyes. Many of the enhancements in Office 365 include accessibility improvements, such as optical character recognition in Office Lens, which inputs content directly into Word, Excel, PowerPoint, and OneNote. For entertainment, Xbox One updates, like Copilot and the Accessibility API, make the system even more accessible to people with disabilities.