Abstract

We describe a set of novel, batch-mode algorithms we developed recently as one key component in scalable, deep neural network based speech recognition. The essence of these algorithms is to structure the single- hidden-layer neural network so that the upper- layer’s weights can be written as a deterministic function of the lower-layer’s weights. This structure is effectively exploited during training by plugging in the deterministic function to the least square error objective function while calculating the gradients. Accelerating techniques are further exploited to make the weight updates move along the most promising directions. The experiments on TIMIT frame-level phone and phone-state classification show strong results. In particular, the error rate is strictly monotonically dropping as the mini-batch size increases. This demonstrates the potential for the proposed batch-mode algorithms in large scale speech recognition since they are easily parallelizable across computers.