Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

  • David Janz | University of Cambridge

Probabilistic Q-learning is a promising approach balancing exploration and exploitation in reinforcement learning.
However, existing implementations have significant limitations: they either fail to incorporate uncertainty about long-term consequences of actions or ignore fundamental dependencies in state-action values implied by the~Bellman equation. These problems result in sub-optimal exploration. As a solution, we develop Successor Uncertainties (SU), a probabilistic Q-learning method free of the aforementioned problems. SU outperforms existing baselines on tabular problems and on the Atari benchmark benchmark suite. Overall, SU is an improved and scalable probabilistic Q-learning method with better properties than its predecessors at no extra cost.

Speaker Details

Second year PhD student at the University of Cambridge Machine Learning Group supervised by José Miguel Hernández-Lobato and Zoubin Ghahramani.