Technology

Through speech R&D leadership, Microsoft delivers breakthrough experiences - new ways of interacting with computers that feel easy, natural, and human.

Around the time when many consumers unwrapped their first keyboards and started clicking their way through the Internet, Microsoft was already thinking ahead to creating more natural user interfaces. In the mid-1990s, we devoted resources to speech and language processing, and employed top speech scientists from around the world and giving them room to explore natural language and speech recognition.

Today Microsoft employs many of the most respected speech engineers working directly on our speech engines and platforms. Our team of scientists has grown into the Microsoft Speech Labs, where natural language and speech findings can be applied to make our lives easier, better, more fun, more productive.

The results are the "Say it. Get it" speech recognition and synthesis capabilities in products ranging from Xbox Kinect for fun to Windows Phone 7 for life and work.

In this section, you can learn about what’s behind Microsoft Tellme speech technologies, what goes into creating them, and how they help you in practical ways.

Microsoft Tellme speech engines power all of Microsoft's speech-enabled products and services. Behind all speech technology are two speech engines: speech recognition (computers "hearing" and understanding human speech) and speech synthesis (computers talking, as with text-to-speech [TTS] capabilities). Used together, the two speech engines work as one, as with a call center, whose computers process customers' spoken commands and synthesize speech in return.

Microsoft Tellme continually improves its speech engines for use in more and more parts of life. The speech recognition engine learns in two ways: first, on the cloud platform, real-world interactions provide tuning data based on actual speech patterns, from traffic requests from Ford SYNC drivers to stock quote inquiries from major brokerage clients. Second, when embedded into devices, each release of the engine improves due to improvements in acoustic models for various users and environments.

Microsoft Tellme speech engine functionality

  • Twenty-six languages
  • SRGS open standard grammar support
  • Online adaptation
  • Acoustic models optimized for speaker characteristics, input channel, and background noise
  • Large statistical language models
  • Multi-slot confidence scores for mixed-initiative dialogs
  • On the Microsoft Tellme cloud platform, businesses, developers and consumers all benefit from a network effect: Just like a search engine, it gets better as more people use it.

    Whether you want to watch the latest superhero movie on Xbox or want to ask your Windows Phone 7 for the highest rated pizza place near you, the Microsoft Tellme cloud platform understands you and delivers. Fast.

    Live recognition for a changing world

    The Microsoft Tellme platform powers over 11 billion speech utterances from users, all making requests as widely varied as the queries of a Bing search. By collecting massive amounts of real-world data with a variety of accents, acoustic environments, and semantic contexts, the platform becomes more agile at hearing and recognizing speech, and delivering relevant results.

    Because the Microsoft Tellme platform is Internet-based, it's constantly responding to and learning from the ever-changing world of search: the tide of news, entertainment, interests and zeitgeists that define our world and that simply couldn't be cataloged any other way.

    This is important for consumers, developers and businesses who need speech technology to understand the idiosyncrasies of modern speech and deliver relevant information. Because who would think to feed speech recognition software "LOLCAT"?

    "It dawned on me yesterday that your network is so stable, so scalable and so predictable that we never even think about it; it just works… We could never build that level of comfort ourselves."
    Global Financial Services Client
Get Microsoft Silverlight
Microsoft Tellme sees a future where the service will know you: know your intent, your social and business connections, and the things that define the context that's important to you. The result will help you accomplish everyday tasks in a more natural and conversational manner. This video depicts a vision and does not make ant product-specific commitments.

The Microsoft Speech Labs apply the science of natural language processing and speech recognition to products for business, life and work. Advanced, functional prototypes of speech technology forge new, natural ways of interacting with computers in real-world scenarios.

Microsoft is driven by the vision of natural user interfaces (NUI), those that incorporate gesture, touch and speech. Led by chief scientist Dr. Larry Heck, the Speech Labs embrace that promise with an emphasis on conversational dialog and voice. Drawing from his 25 years of speech science experience with Microsoft, Yahoo!, Nuance and Stanford Research Institute, Dr. Heck - with his team of some of the brightest speech scientists in the world - drives the proof-of-concept of spoken dialog systems and charts the course of next-generation elements of Microsoft's speech platform.

In Speech Labs, Dr. Heck and his team deliver exciting prototype technologies ranging from speech recognition to spoken language understanding systems to dialog management for both human/human and human/machine conversational understanding.