Abstract

While context-free grammars (CFGs) remains as one of the most important grammars formalisms for interpreting natural language,a word n-gram models is are surprisingly powerful for domain-independent applications. We propose to unify these two grammars formalisms for both speech recognition and spoken language understanding (SLU). With portability as the major problem, we incorporated domain-specific CFGs into a domain-independent n-gram model that can improve generalizability of the CFG and specificity of the n-gram. In our study experiments, the unified model can significantly reduce the test set perplexity from 474378 to 90 in comparison with a domain-independent word trigram. The unified model converges well when the domain-specific data becomes available. The perplexity can be further reduced from 90 to 65 with a limited amount of domain-specific data. While we have demonstrated portability excellent portability, the full potential of our approach lies in its unified recognition and understanding that we are investigating.