SALSA

Established: September 1, 2019

Stochastic optimization methods, including stochastic gradient descent (SGD) and its many variants, serve as the workhorses of deep learning. One of the foremost pain points in using these methods in practice is hyperparameter tuning, especially the learning rate (step size). We develop a statistical adaptive procedure called SALSA to automatically schedule the learning rate for a broad family of stochastic gradient methods. Starting from a small but nonetheless arbitrary learning rate, SALSA first uses a smoothed line-search procedure to gradually increase it to a stable value, reducing the need for expensive trial and error in setting a good initial learning rate. SALSA then automatically switches to a statistical method, which detects stationarity of the learning process under a fixed learning rate, and drops the learning rate by a constant factor whenever stationarity is detected. The combined procedure is highly robust and autonomous, and it matches the performance of the best hand-tuned methods in several popular deep learning tasks.

Project Github page: https://github.com/microsoft/statopt (opens in new tab)