New types of deep neural network learning for speech recognition and related applications: An overview

  • Li Deng
  • Geoffrey Hinton
  • Brian Kingsbury

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013 |

In this paper, we provide an overview of the invited and contributed papers presented at the special session at ICASSP2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications,” as organized by the authors. We also describe the historical context in which acoustic models based on deep neural networks have been developed. The technical overview of the papers presented in our special session is organized into five ways of improving deep learning methods: (1) better optimization; (2) better types of neural activation function and better network architectures; (3) better ways to determine the myriad hyper-parameters of deep neural networks; (4) more appropriate ways to preprocess speech for deep neural networks; and (5) ways of leveraging multiple languages or dialects that are more easily achieved with deep neural networks than with Gaussian mixture models.