Studies in Massively Speaker-Specific Speech Recognition

Yu Shi, Eric Chang


Published by Institute of Electrical and Electronics Engineers, Inc.

Over the past several years, the primary focus for the speech recognition research community has been speaker-independent speech recognition, with the emphasis of working on databases with larger and larger number of speakers. For example, the most recent EARS program which is sponsored by DARPA calls for recordings of thousands of speakers. In this paper, however, we are interested in making speech interface work well for one particular individual. For this purpose, we propose using massive amounts of speaker-specific training data recorded in one’s daily life. We call this Massively Speaker-Specific Recognition (MSSR). As a pre-research, we leverage the large corpus we have available from speech-synthesis work to study the benefit of MSSR only from acoustic-modeling aspect. Initial results show that by changing the focus to MSSR, word error rates can drop very signifi- cantly. In comparison with speaker-adaptive speech recognition system, MSSR also performs better since model parameters can be tuned to be su table to one particular individual.