In this paper, we show how new training principles and optimization techniques for neural networks can be used for different network structures. In particular, we revisit the Recurrent Neural Network (RNN), which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM. We apply pretraining principles used for Deep Neural Networks (DNNs) and second-order optimization techniques to train an RNN. Moreover, we explore its application in the Aurora2 speech recognition task under mismatched noise conditions using a Tandem approach. We observe top performance on clean speech, and under high noise conditions, compared to multi-layer perceptrons (MLPs)

and DNNs, with the added benefit of being a “deeper” model than an MLP but more compact than a DNN.