Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning
- David Janz | University of Cambridge
Probabilistic Q-learning is a promising approach balancing exploration and exploitation in reinforcement learning.
However, existing implementations have significant limitations: they either fail to incorporate uncertainty about long-term consequences of actions or ignore fundamental dependencies in state-action values implied by the~Bellman equation. These problems result in sub-optimal exploration. As a solution, we develop Successor Uncertainties (SU), a probabilistic Q-learning method free of the aforementioned problems. SU outperforms existing baselines on tabular problems and on the Atari benchmark benchmark suite. Overall, SU is an improved and scalable probabilistic Q-learning method with better properties than its predecessors at no extra cost.
Speaker Details
Second year PhD student at the University of Cambridge Machine Learning Group supervised by José Miguel Hernández-Lobato and Zoubin Ghahramani.
Watch Next
-
AI for Precision Health: Learning the language of nature and patients
- Hoifung Poon,
- Ava Amini,
- Lili Qiu
-
-
-
-
-
-
What's new in AutoGen?
- Chi Wang
-
Panel: AI Frontiers
- Ashley Llorens,
- Sébastien Bubeck,
- Ahmed Awadallah