Listening to Users is the Key to Speech Recognition at Microsoft

REDMOND, Wash. — Feb. 1, 2007 — Eventually, speech may be the way that most people want to interact with their PCs and other computing devices. At Microsoft, we are working to make that possible. Speech Recognition is built right into Windows Vista, representing a big step towards realizing that vision.

With Windows Vista, we had several speech recognition goals based on user feedback about earlier versions of our speech recognition software. When we started this project, we held a number of focus groups, which included people with and without disabilities. From those focus groups, we learned what our users were struggling with, and what they wanted to do, so we could build a system that solved those problems. There are a lot of things that contributed to what we achieved with Windows Vista, but one of the key things was that we started by asking users what they needed.

A Computer That Really Understands You

In the past, our technology wasn't mature enough to distinguish between commands and dictation when they were used simultaneously, so we had two separate modes of operation. Users had to tell the computer when they wanted to use voice commands and when they wanted to dictate text. It was a poor user experience, especially for people with disabilities who rely on speech technology to operate their PCs. One goal was to eliminate that burden for users by making the system work seamlessly, so the computer would understand more easily whether it was receiving a voice command or dictation.

Interactive Speech Training

We did something innovative in Windows Vista that simplifies user training and makes it easier for people to learn how to use our speech recognition software. We developed an interactive tutorial that provides both detailed instruction and practice opportunities. Instead of forcing users to read long paragraphs of text to train the system to understand how they say various words, however, we use the words they speak as they work through the tutorial for the same purpose.

Once users complete the tutorial, there is no need for them to do additional training unless their speaking style is very far outside the norm. In those cases, users will have a reasonable experience based on what the system learns during the tutorial, but they will have a better experience if they go back and do some additional training. In either case, the system will continue to learn from them as they do more speaking during regular use.

Hands-Free Computing

Another goal was to make sure that Speech Recognition also provided a completely hands-free computing experience. In earlier versions of our speech recognition technology, we weren't able to give users full control over the mouse or the keyboard. They couldn't use speech to press any key they wanted. They even had to physically press a button or click on the mouse to turn on the speech recognition feature. For people with severe disabilities, those limitations were often a huge problem. In Windows Vista, we reached that goal and have provided a solution that allows for hands-free computing.

Learning from Users

In working toward our speech recognition goals for Windows Vista, we not only set out to achieve them, we also measured our progress along the way. Throughout the development process, we established specific objectives for each of our goals and conducted usability tests to measure progress. We tested everything, and we learned from our users. Our approach was to develop several different strategies to meet a particular user need, test them all, and then pick the one that got the best response in the usability studies.

The usability studies also enabled us to discover subtle user preferences that had never occurred to us. For example, most products that use speech technology require users to learn the military alphabet (Alpha, Bravo, Charlie, etc.) to spell out words, such as street names, that they want to add to the recognized vocabulary. We thought a simple mnemonic, such as "A as in Apple" or "B as in Boy," would be easier for users, so we chose the one or two most common words that started with each letter and used those.

Mnemonics turned out to be a good idea, but we didn't go far enough at first. In the usability study, we put the mnemonic on the screen and gave users a spelling exercise to complete. Much to our surprise, even though the mnemonic was right in front of them, many of the users would choose a different word to represent the letter. For example, they would see "A as in Apple," but they might say "A as in Albuquerque." Because of this feedback, Windows Vista now allows users to say "A" as in any word that starts with that letter. That was a larger technological challenge for us, but it was the best solution for our users.

The Satisfaction of Helping Users

I get a lot of satisfaction out of helping users. I go up on the news groups every day to see what users have to say and to learn what kinds of problems they're having with our products. Besides offering users support today, I can also take what I learn into the design phase for the next version of the product, to help make sure that we hit a home run for all future users.

People with disabilities are great customers for us, because they tell us what is wrong with the software. The people who use speech recognition because of their disabilities are motivated to help us, and we're motivated to work with them, and the result of that partnership is a better product for all users. After all, when you look at what capabilities users with disabilities and those without disabilities want from speech recognition, there is about an 85 percent overlap. By engaging in this relationship with the disabilities community, we can learn more about the strengths and weaknesses of our speech recognition software and use that knowledge to build a better product with each new version.

Rob Chambers, a Microsoft employee since 1995, leads a team devoted to developing state-of-the-art speech recognition technology for products such as Windows Vista. Rob's essay is part of a series of articles that profiles some of the key Microsoft employees, partners and associates who make it easier for people to see, hear, and use computers.



Photo of Rob Chambers

Rob Chambers
Software Architect
Microsoft





"There are a lot of things that contributed to what we achieved with Windows Vista, but one of the key things was that we started by asking users what they needed."




"By engaging in this relationship with the disabilities community, we can learn more about the strengths and weaknesses of our speech recognition software and use that knowledge to build a better product with each new version."