Search organization in the Whisper continuous speech recognition system
Since the earliest days of computing, automatic speech recognition technology has ridden the technology wave that has come to be known as Moore’s law. This is vividly illustrated by the market introduction of several general-purpose continuous speech recognition products. Besides Moore’s law, the two things that have made this possible have been advances in acoustic modeling, especially adaptation technologies, and advances in decoding techniques that permit real-time performance on today’s PCs. This paper discusses our approach to the decoding problem, including the role of heuristic pruning, the A* criteria and its relation to Viterbi searching and stack searching, as well our approach to problems relating to prefix-tree searching and the application of language models and complex acoustic structures, such as cross-word triphones.