Efficient Second-order Optimization for Machine Learning

January 22, 2018
Naman Agarwal | Princeton University

Stochastic gradient-based methods are the state-of-the-art in large-scale machine learning optimization due to their extremely efficient per-iteration computational cost. Second-order methods, that use the second derivative of the optimization objective, are known to enable faster convergence. However, the latter has been much less explored due to the high cost of computing the second-order information. We will present second-order stochastic methods for (convex and non-convex) optimization problems arising in machine learning that match the per-iteration cost of gradient-based methods, yet enjoy the faster convergence properties of second-order optimization overall leading to faster algorithms than the best known gradient-based methods.

Research Area
- Algorithms
- Artificial intelligence

Watch Next

Dion2: A new simple method to shrink matrix in Muon
March 3, 2026
Anson Ho,

Kwangjun Ahn
ARO: A new lens on matrix optimization for LLMs
March 3, 2026
Anson Ho,

Wenbo Gong,

Chao Ma
Efficient Distributed Orthonormal Optimizers for Large-Scale Training
February 12, 2026
Kwangjun Ahn
Disrupting the AI infrastructure with MicroLEDs
September 24, 2025
Maya Murad,

Paolo Costa
Dion: The distributed orthonormal update revolution is here
September 24, 2025
Maya Murad,

Kwangjun Ahn
Belief state transformers
February 25, 2025
John Langford
Using Optimization and LLMs to Enhance Cloud Supply Chain Operations
December 2, 2024
Beibin Li,

Konstantina Mellou,

Ishai Menache

, et. al.
A Fever Dream of Machine Learning Framework Composability
December 2, 2024
Luis Oala
Direct Nash Optimization: Teaching language models to self-improve with general preferences
September 3, 2024
Corby Rosset
Analog optical computing for sustainable AI and beyond
September 3, 2024
Francesca Parmigiani,

Jiaqi Chu

Efficient Second-order Optimization for Machine Learning

Research Area

Watch Next