Teaching Computers to Speak in Tongues

Published March 8, 2012

Share this page

Posted by Kevin Schofield

Perhaps it’s just me, but I wince whenever my American phone or GPS tries to pronounce a French restaurant name or a Spanish street name, or the name of one of my non-American friends. While we’ve made much progress in creating text-to-speech (TTS) systems with human-sounding voices in a comforting accent, they haven’t fared well in our multilingual world.

Frank Soong and his team from Microsoft Research Asia have been working to solve that problem by “cross-training” a text-to-speech system so that it can correctly pronounce words from multiple languages even if it was built from the voice samples of someone who only speaks one.

Frank gave a demo of their TTS system during Rick Rashid’s keynote speech at Techfest yesterday. You can see it in the video below.

This is another great example of how the next generation of technologies will continue to make interacting with computers more natural and help to more seamlessly blend together physical and virtual elements.

Microsoft Research Blog

Microsoft at CHI 2024: Innovations in human-centered design