Batch normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks. However, despite its pervasiveness, the exact reasons for BatchNorm’s effectiveness are still poorly understood.
In this talk, we take a closer look at the underpinnings of the BatchNorm’s success. In particular, we examine the popular belief that the root of BatchNorm’s effectiveness is due to reduction of an effect called internal covariate shift (ICS). We then explore the connection between BatchNorm, ICS, and the optimization landscape of deep neural networks.