Abstract

We recently developed context-dependent DNN-HMM (DeepNeural-Net/Hidden-Markov-Model) for large-vocabulary speech recognition. While achieving impressive recognition error rate reduction, we face the insurmountable problem of scalability in dealing with virtually unlimited amount of training data available nowadays. To overcome the scalability challenge, we have designed the deep convex network (DCN) architecture. The learning problem in DCN is convex within each module. Additional structure-exploited fine tuning further improves the quality of DCN. The full learning in DCN is batch-mode based instead of stochastic, naturally lending it amenable to parallel training that can be distributed over many machines. Experimental results on both MNIST and TIMIT tasks evaluated thus far demonstrate superior performance of DCN over the DBN (Deep Belief Network) counterpart that forms the basis of the DNN. The superiority is reflected not only in training scalability and CPU-only computation, but more importantly in classification accuracy in both tasks.

‚Äč