Bing Speech API

Convert audio to text, understand intent, and convert text back to speech for natural responsiveness.

Get started for free

Speech Recognition

Convert spoken audio to text. The API can be directed to turn on and recognize audio coming from the microphone in real-time, recognize audio coming from a different real-time audio source, or to recognize audio from within a file. In all cases, real-time streaming is available, so as the audio is being sent to the server, partial recognition results are also being returned. The Speech to Text API enables you to build smart apps that are voice triggered. To see how it works select your target language then click on the microphone and start speaking. Or simply click on one of the sample speech phrases to see how speech recognition works. When you use this demo you consent to providing your voice input data to Microsoft for service improvement purposes.

To try out the demo with your own voice using a microphone,please change to a different browser with WebRTC support,for example recent version of Microsoft Edge, Firefox or Chrome.

Click on the microphone to start speaking.

Text to Speech

Convert text to spoken audio. When applications need to “talk” back to their users, this API can be used to convert text that is generated by the app into audio that can be played back to the user. The Text-To-Speech API enables you to build smart apps that can speak. You can test it now, simply choose your target language, add your sentences then click on the play button to see how speech synthesis works. When you use this demo you consent to providing your voice input data to Microsoft for service improvement purposes.

Speech Intent Recognition

Convert spoken audio to intent. This is similar to Speech Recognition. With Speech Intent Recognition -in addition to returning recognized text from audio input- the server returns structured information about the incoming speech so that apps can easily parse the intent of the speaker, and subsequently drive further action. Models trained by Microsoft Language Understanding Intelligent Service (LUIS) service are used to generate the intent.

Pricing options

Plan Description Price
Free 5,000 transactions per month Free
Text to speech up to 1000 characters per transaction $4 per 1000 transactions
Short form recognition up to 15 seconds per transaction $4 per 1000 transactions
Long form recognition up to 2 minutes per transaction 0-10 hours at $9 per hour, 10-100 hours at $7.50 per hour, over 100 hour at $5.50 per hour
Buy on Azure

Developer resources for Bing Speech API