Microsoft Speech services are now in general availability. Part of Azure Cognitive Services, Speech offers complete speech capabilities, including speech recognition, translation, and text-to-speech in a set of unified and customizable services. It combines the capabilities of the existing Microsoft Translator Speech API, Bing Speech API, and Custom Speech Service (preview).
Speech is enterprise ready and scalable for your needs, from prototyping to production. It can be added to your apps, websites, and workflows through an Azure subscription.
Speech supports 11 speech-to-speech translation languages. Speech from any of those 11 languages can also be translated into more than 60 text languages. Lists of supported languages for translation, speech recognition, and text-to-speech can be found in the Speech services documentation.
Customizable end-to-end solution
Similarly to the Microsoft Translator Speech API, the Speech translation service combines all the elements needed for speech translation in one integrated service: speech recognition including TrueText text normalization, text translation through the Microsoft Translator service, and text-to-speech.
In addition, speech translations are customizable at each level, from input speech recognition to translation to output text-to-speech.
Speech recognition and TrueText normalization: Convert speech audio into text
The speech audio is processed and converted into raw text output. After the speech is converted, TrueText normalizes the text, to make it more appropriate for translation. TrueText removes speech disfluencies (filler words such as “um”s and “ah”s), stutters, and repetitions. The text is also made more readable and translatable by adding sentence breaks, proper punctuation, and capitalization.
Speech recognition can be customized using Custom Speech. With Custom speech, users can build custom language models tailored to their own vocabulary and unique speaking style. Custom acoustic models can also be created to adapt to user environment to make sure the speech recognition can adapt to various microphones, sampling rate or background noise.
Machine Translation: Translate the text
The converted text is translated using neural machine translation specially developed for real-life spoken conversations.
Custom Translator (preview) allows users to customize Translator neural translation systems into one that understands the terminology used in a company or industry.
Systems customized with Custom Translator can be used for both speech translations and text translations using the Microsoft Translator’s Text API.
Text-to-speech: Produce audio from the translated text
Text-to-speech, or voice synthesis, creates computer-generated audio output from the translated text. Users can choose from more than 75 voices in over 45 languages or locales, including options for male and female voices.
With Custom Voice, users can also customize the voice by recording and upload training data. The service creates a unique voice tuned to your recordings.
Get started with unified Speech
Documentation for Speech is available here, and is full of quick starts, tutorials, and how to guides to help you add the service to your app.