Portrait of Frank Soong

Frank Soong

Principal Researcher and Research Manager, Speech Group


Frank Soong is a Principal Researcher and Manager of the Speech Group, where speech modeling, recognition, synthesis research is conducted.


Voice Conversion with Neural Network

Established: March 24, 2014

Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion Neural network (NN) based voice conversion, which employs a nonlinear function to map the features from a source to a target speaker, has been shown to outperform GMM-based voice version approach. However, there are still limitations to be overcome in NN-based voice conversion: NN is trained on a frame error (FE) minimization criterion and the corresponding weights are adjusted to minimize the error squares…

Turning a Monolingual Speaker Into Multi-Lingual Speaker

Established: February 21, 2012

Voice user interface needs to output responses in Text-To-Speech (TTS) synthesized speech. Sometimes it is even more desirable to have the response in mixed languages, For example, in a foreign country, it would be convenient if a user of car-navigation system who is not fluent in that particular foreign language could hear instructions in mixed-codes, such as entities like street names synthesized in the local language and routing directions in the user’s native language. Voice…







He received his BS, MS and Ph. D, all in EE from the National Taiwan University, the University of Rhode Island and Stanford University, respectively. He joined Bell Labs Research, Murray Hill, NJ, USA in 1982, worked there for 20 years and retired as a Distinguished Member of Technical Staff in 2001. In Bell Labs, he had worked on various aspects of acoustics and speech processing, including: speech coding, speech and speaker recognition, stochastic modeling of speech signals, efficient search algorithms, discriminative training, dereverberation of audio and speech signals, microphone array processing, acoustic echo cancellation, hands-free noisy speech recognition. He was also responsible for transferring recognition technology from research to AT&T voice-activated cell phones which were rated by the Mobile Office Magazine as the best among competing products evaluated. He was the co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package. He visited Japan twice as a visiting researcher: first from 1987 to 1988, to the NTT Electro-Communication Labs, Musashino, Tokyo; then from 2002-2004, to the Spoken Language Translation Labs, ATR, Kyoto. In 2004, he joined Microsoft Research Asia (MSRA), Beijing, China to lead the Speech Research Group. He is a visiting professor of the Chinese University of Hong Kong (CUHK) and the co-director of CUHK-MSRA Joint Research Lab, recently promoted to a National Key Lab of Ministry of Education, China. He was the co-chair of the 1991 IEEE International Arden House Speech Recognition Workshop. He is a committee member of the IEEE Speech and Language Processing Technical Committee of the Signal Processing Society and has served as an associate editor of the Transactions of Speech and Audio Processing. He published extensively and coauthored more than 200 technical papers in the speech and signal processing fields. He is an IEEE Fellow.

Speech Group’s home page.