This short monograph contains the material expanded from two tutorials that the authors gave, one at APSIPA in October 2011 and the other at ICASSP in March 2012. Substantial updates have been made based on the literature up to March, 2013, covering practical aspects in the fast development of deep learning research during the interim year.
In Chapter 1, we provide the background of deep learning, as intrinsically connected to the use of multiple layers of nonlinear transformations to derive features from the sensory signals such as speech and visual images. In the most recent literature, deep learning is embodied as representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In Chapter 2, a brief historical account of deep learning is presented. In particular, the historical development of speech recognition is used to illustrate the recent impact of deep learning. In Chapter 3, a three-way classification scheme for a large body of work in deep learning is developed. We classify a growing number of deep architectures into generative, discriminative, and hybrid categories, and present qualitative descriptions and a literature survey for each category. From Chapter 4 to Chapter 6, we discuss in detail three popular deep learning architectures and related learning methods, one in each category. Chapter 4 is devoted to deep autoencoders as a prominent example of the (non-probabilistic) generative deep learning architectures. Chapter 5 gives a major example in the hybrid deep architecture category, which is the discriminative feed-forward neural network with many layers using layer-by-layer generative pre-training. In Chapter 6, deep stacking networks and several of the variants are discussed in detail, which exemplify the discriminative deep architectures in the three-way classification scheme.
From Chapters 7-11, we select a set of typical and successful applications of deep learning in diverse areas of signal and information processing. In Chapter 7, we review the applications of deep learning to speech recognition and audio processing. In Chapters 8 and 9, we present recent results of applying deep learning in language modeling and natural language processing, respectively. In Chapters 10 and 11, we discuss, respectively, the applications of deep learning in information retrieval and image, vision, and multimodal processing. Finally, an epilogue is given in Chapter 12 to summarize what we presented in earlier chapters and to discuss future challenges and directions.