Stochastic Approximation and Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
- Adithya Devraj | University of Florida
Stochastic approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. Among many algorithms in machine learning, reinforcement learning algorithms such as TD- and Q-learning are two of its most famous applications.
This talk will provide an overview of stochastic approximation, with focus on optimizing the rate of convergence. Based on this general theory, the well known slow convergence of Q-learning is explained: the variance of the algorithm is typically infinite. Three new Q-learning algorithms are introduced to dramatically improve performance: (i) The Zap Q-learning algorithm that has provably optimal asymptotic variance, and resembles the Newton-Raphson method in a deterministic setting (ii) The PolSA algorithm that is based on Polyak’smomentum technique, but with a specialized matrix momentum, and (iii) The NeSA algorithm based on Nesterov’s acceleration technique.
Analysis of (ii) and (iii) require entirely new analytic techniques. One approach is via coupling: conditions are established under which the parameter estimates obtained using the PolSA algorithm couple with those obtained using the Newton-Raphson based algorithm. Numerical examples confirm this behavior, and the remarkable performance of these algorithms.
View presentation slides here: https://www.microsoft.com/en-us/research/wp-content/uploads/2018/11/Hidden-Theory-and-New-Super-Fast-Algorithms-SLIDES.pdf
Speaker Details
Adithya is a PhD student at the University of Florida where he works with Prof. Sean Meyn. The focus of his research has been variance reduction in stochastic approximation algorithms with application to reinforcement learning. He is currently an intern at the Tencent AI Lab, before which he has held visiting / research positions at the Indian Institute of Science, Bangalore, Inria, Paris, and the Simons Institute for the Theory of Computing at UC Berkeley.
-
-
Lin Xiao
Senior Principal Researcher
-
-
Series: Microsoft Research Talks
-
Decoding the Human Brain – A Neurosurgeon’s Experience
- Dr. Pascal O. Zinn
-
-
-
-
-
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
- Sophia Mehdizadeh
-
Tongue-Gesture Recognition in Head-Mounted Displays
- Tan Gemicioglu
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
- Shoken Kaneko
-
-
-
-
Audio-based Toxic Language Detection
- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
- Forrest Iandola,
- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
- Ashique Khudabukhsh
-
-
-
Towards Mainstream Brain-Computer Interfaces (BCIs)
- Brendan Allison
-
-
-
-
Learning Structured Models for Safe Robot Control
- Subramanian Ramamoorthy
-