On the difficulty of training recurrent and deep neural networks

Date

June 4, 2013

Speaker

Razvan Pascanu

Affiliation

University of Montreal

Overview

Deep learning is quickly becoming a popular subject in machine learning. A lot of this success is due to the advances done in how these models are trained. There are however still many unanswered questions left. In this talk I will do a short review of existing approaches focusing on two distinct topics. The first one regards training recurrent neural models and specifically addresses the notorious vanishing and exploding gradient problem introduced in Bengio et al. 1994. I will explore these issues from different perspectives, specifically we will look at the problem analytically, from a geometric perspective and using intuitions from dynamical system theory. These perspectives provide hypotheses for the underlying reasons causing these events which lead to heuristic solutions that seem to work well in practice. A second theme of the talk will be to look at natural gradient as an alternative to stochastic gradient descent for learning. I will describe links between natural gradient and other recently proposed optimization techniques such as Hessian-Free, Krylov Subspace Descent or TONGA. I will talk about the specific properties of natural gradient which should help during training and I will touch on the subject of efficient implementation and practical thumb rules of using the algorithm. I hope to see many of you there.

Speakers

Razvan Pascanu

Razvan Pascanu is a 4th year PhD student at University of Montreal working with Yoshua Bengio. His research is in the field of deep learning with focus on recurrent neural models. He is also a developer of Theano (http://deeplearning.net/software/theano) a python library meant to make possible scaling up machine learning algorithms and worked on the Deep Learning Tutorials (http://deeplearning.net/tutorial). He is currently doing an internship at MSR working with Jay Stokes on applying recurrent and deep learning techniques to malware detection problems.