Text-to-speech synthesis technology in European Portuguese and Brazilian Portuguese
In these projects, it is our goal to develop Text-to-Speech systems for European Portuguese and Brazilian Portuguese. Each of them involves three major milestones: voice font recording, front-end building and back-end integration.
Some of the problems related to voice font acquisition involve decisions on sex, age, dialect variety, quality and intelligibility of the talent’s voice. Other frequent problems address the choice of the scripts that will be read and recorded by the voice talent as well as the number of hours considered.
The front-end component includes: document structure detection (sentence breaking and paragraph segmentation have implications on intonation and prosodic phrasing), normalization (involving the conversion of abbreviations, acronyms, cardinal and ordinal numbers, dates, money and currency, mathematical expressions), letter-to-sound conversion, stressed syllable marking, syllabic division, homograph disambiguation (where morphological and syntactic analysis play an essential role) and foreign words pronunciation. Prosody generator is another important component of the TTS architecture, from where intonation, emotion, attitude and style are predicted.
The back-end is mainly composed by a signal processing system from which an audio speech signal will result. This system can consider a wide range of parameters that, well combined, will lead to a high quality output.