Deep learning is quickly becoming a popular subject in machine learning. A lot of this success is due to the advances done in how these models are trained. There are however still many unanswered questions left. In this talk I will do a short review of existing approaches focusing on two distinct topics. The first one regards training recurrent neural models and specifically addresses the notorious vanishing and exploding gradient problem introduced in Bengio et al. 1994. I will explore these issues from different perspectives, specifically we will look at the problem analytically, from a geometric perspective and using intuitions from dynamical system theory. These perspectives provide hypotheses for the underlying reasons causing these events which lead to heuristic solutions that seem to work well in practice. A second theme of the talk will be to look at natural gradient as an alternative to stochastic gradient descent for learning. I will describe links between natural gradient and other recently proposed optimization techniques such as Hessian-Free, Krylov Subspace Descent or TONGA. I will talk about the specific properties of natural gradient which should help during training and I will touch on the subject of efficient implementation and practical thumb rules of using the algorithm. I hope to see many of you there.