Temporal Difference Models: Deep Model-free RL for Model-based

  • Shane Gu | University of Cambridge

Deep reinforcement learning (RL) has shown promising results for learning complex sequential decision-making behaviors in various environments. However, most successes have been exclusively in simulation, and results in real-world applications such as robotics are limited, largely due to poor sample efficiency of typical deep RL algorithms. I will introduce temporal difference models (TDMs), an extension of goal-conditioned value functions that enables multi-time resolution model-base planning. TDMs generalize traditional predictive models, bridge the gap between model-based and off-policy model-free RL, and show substantial improvements in sample efficiency without introducing asymptotic performance loss.

Speaker Details

Shixiang (Shane) Gu is a PhD candidate at University of Cambridge and Max Planck Institute for Intelligent Systems, where he is jointly co-supervised by Richard E. Turner, Zoubin Ghahramani, and Bernhard Schoelkopf. He holds BASc. in Engineering Science from University of Toronto, where he completed this thesis with Professor Geoffrey Hinton. His research interests span deep reinforcement learning, deep learning, robotics, approximate inference and causality, and his research has been featured by MIT Technology Review and Google Research Blog. He also collaborates closely with Sergey Levine from UC Berkeley/Google and Tim Lillicrap from DeepMind.