Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

March 26, 2019
David Janz | University of Cambridge

Probabilistic Q-learning is a promising approach balancing exploration and exploitation in reinforcement learning.
However, existing implementations have significant limitations: they either fail to incorporate uncertainty about long-term consequences of actions or ignore fundamental dependencies in state-action values implied by the~Bellman equation. These problems result in sub-optimal exploration. As a solution, we develop Successor Uncertainties (SU), a probabilistic Q-learning method free of the aforementioned problems. SU outperforms existing baselines on tabular problems and on the Atari benchmark benchmark suite. Overall, SU is an improved and scalable probabilistic Q-learning method with better properties than its predecessors at no extra cost.

No results