LSH-Sampling Breaks the Computation Chicken-and-Egg Loop in Adaptive Stochastic Gradient Estimation
- Beidi Chen
Stochastic Gradient Descent or SGD is the most popular algorithm for large-scale optimization. In SGD, the gradient is estimated by uniform sampling with sample size one. There have been several results that show better gradient estimates, using weighted non-uniform sampling, which leads to faster convergence. Unfortunately, the per-iteration cost of maintaining this adaptive distribution is costlier than the exact gradient computation itself, which creates a chicken-and-egg loop making the fast convergence useless. In this paper, we break this barrier by providing the first demonstration of a sampling scheme, which leads to superior gradient estimation, while keeping the sampling cost per iteration similar to the uniform sampling. Such a scheme is possible due to recent advances in Locality Sensitive Hashing (LSH) literature. As a consequence, we improve the running time of all existing gradient descent algorithms.
Series: Microsoft Research Talks
-
Decoding the Human Brain – A Neurosurgeon’s Experience
- Dr. Pascal O. Zinn
-
-
-
-
-
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
- Sophia Mehdizadeh
-
Tongue-Gesture Recognition in Head-Mounted Displays
- Tan Gemicioglu
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
- Shoken Kaneko
-
-
-
-
Audio-based Toxic Language Detection
- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
- Forrest Iandola,
- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
- Ashique Khudabukhsh
-
-
-
Towards Mainstream Brain-Computer Interfaces (BCIs)
- Brendan Allison
-
-
-
-
Learning Structured Models for Safe Robot Control
- Subramanian Ramamoorthy
-