2nd-order Optimization for Neural Network Training
Neural networks have become the main workhorse of supervised learning, and their efficient training is an important technical challenge which has received a lot of attention. While stochastic gradient descent (SGD) with momentum works well…