Developers

Solve a big user problem. Create a delightful UI. Do something no one’s thought of before, with voice interactions.
  • However you choose to leverage Microsoft Tellme's robust speech technology, there's an extensive set of development tools waiting for you. Whether on your servers, as a desktop application, or via our cloud platform, you'll be able to deliver a "Say it. Get it." experience to your users.

    Developer can empower applications with the same high-performance speech technology that is built into some of the most widely used products, such as Microsoft Exchange, Microsoft Lync, Microsoft Windows, Microsoft Office and more.

  • Server Platform

    For enterprise developers, independent software vendors (ISVs), and service providers who want to build telephony applications (e.g., for contact centers) or to otherwise add speech functionality to an existing server application, use the Microsoft Lync Unified Communications API (UCMA 3.0) and the Microsoft Speech Platform.

  • Desktop Platform

    For non-commercial developers who want to use the Kinect sensor as an input for Windows apps, discover the Kinect for Windows SDK. For independent and enterprise IT developers who want to add hands- and eyes-free functionality (such as dictation) to desktop applications for accessibility, automation or entertainment, use the .NET System.Speech namespace, Windows Speech Recognition Macros, or Speech API (SAPI).

  • Phone Platform

    Be the first to learn about speech features for Windows Phone developers by subscribing to our mobile speech developer interest list. In the meantime, start building apps for Windows Phone at the App Hub

  • If you’re building a server application that incorporates speech with other communications technologies, such as enhanced presence, voice over Internet protocol (VoIP), instant messaging, conferencing, or telephone or video calls, you’ll find the capabilities you need in the Microsoft Unified Communications API (UCMA 3.0) platform.

    If your server application has more specialized speech requirements, consider using the Speech Platform for servers directly.

  • Microsoft Unified Communications API

    Use the Microsoft Unified Communications Managed API 3.0 (UCMA) to:

    • expand the capabilities of your business software and processes with communications technologies.
    • create outbound applications such as alerts, notifications or surveys.
    • create inbound interactive voice response applications and automated agents.

    UCMA 3.0 supports the development of server-side, middle-tier applications for Microsoft Lync 2010. It contains a SIP stack, a media stack, powerful speech engines for both automatic speech recognition (ASR) and speech synthesis (or TTS, text-to-speech), and a VoiceXML 2.0 interpreter. UCMA 3.0 offers developers the option of using VoiceXML either to build new functionality into a UCMA app, or to port an existing VoiceXML application to the Microsoft server platform.

    The API gives access to the presence information available in Microsoft Lync 2010, and can be used to build role agents that use the rich presence information to streamline communications between people.

    The UCMA 3.0 Core API abstracts away most of the Office Communications Server SIP/SIMPLE–based protocols by offering an API that exposes almost all features of the protocol, but that is simpler to use.

    The UCMA 3.0 Speech API is a server-grade speech API that allows developers to build multi-channel speech recognition and speech synthesis–enabled applications using state-of-the-art Microsoft Tellme technology. The UCMA 3.0 Speech API supports 26 languages.

    The UCMA 3.0 Workflow API is a higher API abstraction layer of the UCMA Core and Speech APIs. It adds unified communications (UC) Windows Workflow activities to the .NET 3.5 SP1 Workflow foundation for querying presence and IM or speech technology–enabled dialogs in Workflow-based applications that are built, for example, on Microsoft SharePoint Server 2007.

    Windows Workflow has made it easier to develop communications-enabled business process (CEBP) applications. In Microsoft Lync Server 2010, Microsoft innovates by leveraging .NET and tools familiar to developers such as the Visual Studio Integrated Development Environment (IDE) through the integration of Windows Workflow-based UC activities into the overall Windows Workflow Foundation.

    Simple drag-and-drop development is now an option for building sophisticated applications such as automated agents (a.k.a. query response bots) or IVR applications using speech technology built into the UC Managed API of Microsoft Lync Server 2010. The Visual Studio plug-in makes communication actions or information queries easy.

  • Microsoft Speech Platform for Servers

    Microsoft Speech Platform server technologies are mainly used to develop on-premise telephony applications, such as interactive voice response (IVR) for call centers, or to add speech functionality to an existing server application. It is generally used as part of UCMA 3.0, although the latest version, v11, can be used separately for more specialized applications.

    Using the Speech Platform outside UCMA 3.0

    The Microsoft Speech Platform Server Runtime provides applications with server-grade speech recognition and speech synthesis in 26 languages. The platform underpins the speech capabilities of many of Microsoft’s products, including Lync, Exchange, and Microsoft Tellme services.

    The speech platform is available to developers either from managed code, via the .NET Microsoft.Speech namespace, or native code, via a server version of Speech API (SAPI). The Speech Platform SDK includes documentation for these APIs, as well as a number of tools to help developers tune and trouble-shoot the speech-recognition functionality of their apps.

  • Get Microsoft Silverlight
    Developers at Code Camp 2011 explore limitless possibilities with the Kinect for Windows SDK beta.

    Developers who build applications with C++, C#, or Visual Basic can use Microsoft Visual Studio 2010 and the Kinect for Windows SDK to access the capabilities of the Kinect sensor.

    If you are working in the managed environment of Microsoft .NET, then the System.Speech namespace provides the easiest access to Windows speech services.

    To easily extend the built-in speech recognition functionality included in Windows, you can use Windows Speech Recognition Macros or, for more advanced uses, the Microsoft Speech API (SAPI)

  • Kinect for Windows

    The Kinect for Windows SDK gives developers easy access to the capabilities offered by the Microsoft Kinect device connected to computers running the Windows 7 operating system and Windows 8 Developer Preview. It includes drivers, rich APIs for raw sensor streams and human motion tracking.

    To learn how to program your app to read audio data from the Kinect microphone and use speech recognition features, watch this Audio Fundamentals quickstart video which comes with samples and slides. Channel 9 also features a developer-created Introduction to Kinect Speech Recognition in its Coding4Fun project gallery.

    Downloads

    To use speech features in your Kinect for Windows apps, download the files below.

    Kinect for Windows SDK
    Microsoft Speech Platform SDK v10.2
    Microsoft Speech Platform Runtime v10.2
    Kinect for Windows Language Pack, v0.9

  • .NET System.Speech Namespace

    The .NET System.Speech namespace provides the APIs to develop desktop applications using the speech recognition and synthesis capabilities of Windows. It is chiefly for independent developers to create vertical applications for hands- or eyes-free use, notably to create dictation or accessibility or automation applications. The System.Speech namespace is part of the .NET runtime.

    The speech functionality is broken down into two logical groups:

    The managed System.Speech namespace is designed to be intuitive and to help developers take advantage of .NET programming languages. With only two lines of code in a Visual Basic .NET application, your computer can speak any text provided:

    Module TTSHelloWorld
        Sub Main()
            Dim ttsVoice As New Speech.Synthesis_SpeechSynthesizer         ttsVoice.speak(“Hello world!”)
        End Sub
    End Module

    The .NET Framework makes the process easier than writing a SAPI-based native application. Creating recognition instances, loading grammars and handling speech events are included in the namespace.

    As a part of the Microsoft .NET Framework, System.Speech supports 32- and 64-bit development under Windows 7, Windows Server 2008, Windows Server 2003, Windows Vista, and Windows XP. Note that only Windows 7 and Windows Vista ship with a built-in speech recognition engine. System.Speech applications, similar to native SAPI applications, can take advantage of any 3rd-party speech engine that supports SAPI 5.1 or later.

    Resources

    To get started developing within the managed .NET environment, refer to the documentation of the System.Speech namespace on MSDN. Complete documentation, including sample source code, is included in the Windows SDK.

    System.Speech Namespace on MSDN
    Windows SDK

  • Speech Recognition (WSR) Macros

    The WSR Macros tool is for independent and enterprise IT developers who want an easy way to extend the built-in speech recognition commands in Windows without writing code or learning an API. With WSR Macros, it’s easy to configure phrases Windows should listen for and the corresponding keystrokes or actions Windows should perform.

  • Speech API (SAPI)

    Microsoft Speech API (SAPI) is for advanced Windows native code development: incorporating sophisticated, specialized speech functionality into Windows applications. Usually SAPI developers can build hands- or eyes-free abilities into Windows applications, for accessibility and automation.

    SAPI is a component object model (COM)–based API for native Windows applications. It includes dozens of objects and interfaces that can be used by applications to listen for speech, recognize content, process spoken commands, and speak text. SAPI is most easily used from applications written using Microsoft Visual C++. It works with development frameworks such as the Microsoft Foundation Classes (MFC) and the Active Template Library (ATL).

    The Speech API has been an integral component of all Microsoft Windows versions since Windows 98. Microsoft Windows XP and Windows Server 2003 include SAPI version 5.1. Windows Vista and Windows Server 2008 include SAPI version 5.3, while Windows 7 includes SAPI version 5.4. Code written for SAPI 5.3 (Vista) will run on SAPI 5.4 (Windows 7) without recompiling.

    Resources

    To get started developing for speech, refer to the documentation on MSDN. You’ll also need the Windows SDK, which includes sample source code.

    SAPI 5.4 (Windows 7)
    SAPI 5.3 (Windows Vista)

    Demos

    Header files, libraries and code samples are included in the Windows SDK