On the difficulty of training recurrent and deep neural networks
- Razvan Pascanu | University of Montreal
Deep learning is quickly becoming a popular subject in machine learning. A lot of this success is due to the advances done in how these models are trained. There are however still many unanswered questions left. In this talk I will do a short review of existing approaches focusing on two distinct topics. The first one regards training recurrent neural models and specifically addresses the notorious vanishing and exploding gradient problem introduced in Bengio et al. 1994. I will explore these issues from different perspectives, specifically we will look at the problem analytically, from a geometric perspective and using intuitions from dynamical system theory. These perspectives provide hypotheses for the underlying reasons causing these events which lead to heuristic solutions that seem to work well in practice. A second theme of the talk will be to look at natural gradient as an alternative to stochastic gradient descent for learning. I will describe links between natural gradient and other recently proposed optimization techniques such as Hessian-Free, Krylov Subspace Descent or TONGA. I will talk about the specific properties of natural gradient which should help during training and I will touch on the subject of efficient implementation and practical thumb rules of using the algorithm. I hope to see many of you there.
Speaker Details
Razvan Pascanu is a 4th year PhD student at University of Montreal working with Yoshua Bengio. His research is in the field of deep learning with focus on recurrent neural models. He is also a developer of Theano (http://deeplearning.net/software/theano) a python library meant to make possible scaling up machine learning algorithms and worked on the Deep Learning Tutorials (http://deeplearning.net/tutorial). He is currently doing an internship at MSR working with Jay Stokes on applying recurrent and deep learning techniques to malware detection problems.
-
-
Jeff Running
-
Series: Microsoft Research Talks
-
Decoding the Human Brain – A Neurosurgeon’s Experience
- Dr. Pascal O. Zinn
-
-
-
-
-
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
- Sophia Mehdizadeh
-
Tongue-Gesture Recognition in Head-Mounted Displays
- Tan Gemicioglu
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
- Shoken Kaneko
-
-
-
-
Audio-based Toxic Language Detection
- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
- Forrest Iandola,
- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
- Ashique Khudabukhsh
-
-
-
Towards Mainstream Brain-Computer Interfaces (BCIs)
- Brendan Allison
-
-
-
-
Learning Structured Models for Safe Robot Control
- Subramanian Ramamoorthy
-