Recent Improvements on Microsofts Trainable Text-to-Speech System: Whistler
- Xuedong Huang ,
- Alex Acero ,
- Hsiao-Wuen Hon ,
- Y. C. Ju ,
- J. Liu ,
- S. Meredith ,
- M. Plumpe
Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing |
Published by Institute of Electrical and Electronics Engineers, Inc.
Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data [7]. This paper will focus on recent improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style. Whisper TTS engine supports Microsoft Speech API [10] and requires less than 3 MB of working memory.
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.