Abstract

A speech recognizer is developed using a layered neural network to implement speech-frame prediction and using a Markov chain to modulate the network ‘a weight parameters. We postulate that speech recognition accuracy is closely linked to the capability of the predictive model in representing longterm temporal correlations in data. Analytical expressions are obtained for the correlation functions for various types of predictive models (linear,  nonlinear, and jointly linear and nonlinear) in order to determine the faithfulness of the models to the actual speech data. The analytical results, computer simulations, and speech recognition experiments suggest that when nonlinear and linear prediction are jointly performed within the same layer of the neural network, the model is better able to capture long-term data correlations and consequently improve speech recognition performance.