Achievements and Challenges of Deep Learning
APSIPA Transactions on Signal and Information Processing
While artificial neural networks have been around for over half a century, it was not until year 2010 that they had made a significant impact on speech recognition with a deep form of such networks. This article, based on my keynote talk given at Interspeech conference in Singapore in September 2014, will first reflect on the historical path to this transformative success, after providing brief reviews of earlier studies on (shallow) neural networks and on (deep) generative models relevant to the introduction of deep neural networks (DNN) to speech recognition several years ago. The role of well-timed academicindustrial collaboration is highlighted, so are the advances of big data, big compute, and the seamless integration between the application-domain knowledge of speech and general principles of deep learning. Then, an overview is given on sweeping achievements of deep learning in speech recognition since its initial success. Such achievements, summarized into six major areas in this article, have resulted in across-the-board, industry-wide deployment of deep learning in speech recognition systems. Next, more challenging applications of deep learning, natural language and multimodal processing, are selectively reviewed and analyzed. Examples include machine translation and automatic image captioning, where fresh ideas from deep learning, continuous-space embedding in particular, are shown to be revolutionizing these application areas, albeit with less rapid pace than for speech and image recognition. Finally, a number of key issues in deep learning are discussed and future directions are analyzed for perceptual tasks such as speech, image, and video, as well as for cognitive tasks involving natural language.