Abstract

This paper reports on experiments to quantify the benefits of large training databases for nonEnglish HMM-based keyword spotting. The research was motivated by the lack of such databases for many non-English languages, and aims to determine if the significant cost and delay of creating these databases justifies the gains in keyword spotting performance. HMM based keyword spotting experiments performed for English, Spanish and Indonesian found that although some gains in performance can be obtained through increased training database size, the magnitude of these gains may not necessarily justify the effort and incurred delay of constructing such databases. This has ramifications for the immediate development and deployment of non-English keyword spotting systems.