Stacking Model-Based Korean Prosodic Phrasing Using Speaker Variability Reduction and Linguistic Feature Engineering

Jinsik Lee; Sungjin Lee; Jonghoon Lee; Byeongchang Kim; Gary Geunbae Lee

Stacking Model-Based Korean Prosodic Phrasing Using Speaker Variability Reduction and Linguistic Feature Engineering

Jinsik Lee ,
Sungjin Lee ,
Jonghoon Lee ,
Byeongchang Kim ,
Gary Geunbae Lee

ACM Transactions on Asian Language Information Processing (TALIP) | September 2012

Download BibTex

This article presents a prosodic phrasing model for a general purpose Korean speech synthesis system. To reflect the factors affecting prosodic phrasing in the model, linguistically motivated machine-learning features were investigated. These features were effectively incorporated using a stacking model. The phrasing performance was also improved through feature engineering. The corpus used in the experiment is a 4,392-sentence corpus (55,015 words with an average of 13 words per sentence). Because the corpus contains speaker-dependent variability and such variability is not appropriately reflected in a general purpose speech synthesis system, a method to reduce such variability is proposed. In addition, the entire set of data used in the experiment is provided to the public for future use in comparative research.