Deep Interactive Bayesian Reinforcement Learning via Meta-Learning
Agents that interact with other agents often do not know a priori what the other agents’ strategies are, but have to maximise their own online return while interacting with and learning about others. The optimal adaptive behaviour under uncertainty over the other agents’ strategies w.r.t. some prior can in principle be computed using the Interactive Bayesian Reinforcement Learning framework. Unfortunately, doing so is intractable in most settings, and existing approximation methods are restricted to small tasks. To overcome this, we propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior. To model beliefs over other agents, we combine sequential and hierarchical Variational Auto-Encoders, and meta-train this inference model alongside the policy. We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
Games have a long history as test beds in pushing AI research forward. From early works on chess and Go to more recent advances on modern video games, researchers have used games as complex decision-making benchmarks. Learning in multi-agent settings is one of the fundamental problems in AI research, posing unique challenges for agents that learn independently, such as coordinating with other learning agents or adapting rapidly online to agents they haven’t previously learned with. In this webinar, join Microsoft researcher Sam Devlin and Queen Mary University of London researchers Martin Balla, Raluca D. Gaina, and Diego Perez-Liebana to learn how the latest AI techniques can be applied to multiplayer games in the challenging and diverse 3D environment of Minecraft. The researchers will demonstrate how Project Malmo—a platform for AI experimentation built on Minecraft—provides an ideal environment for designing different and rich training tasks and how reinforcement learning agents can be trained in these scenarios. They’ll provide examples of tasks, agent implementations, and the latest research done in this area. Together, you’ll explore: The Malmo platform and multi-agent tasks Using the reinforcement learning library RLlib to implement and train agents to complete Minecraft tasks Coordinated policies for collaborative multi-agent tasks Open challenges in learning robust policies for ad-hoc teamwork Resource list: Project Malmo - Microsoft Research (project page) Project Malmo key repository (GitHub) Difference Rewards Policy Gradients (paper) Deep Interactive Bayesian Reinforcement Learning via Meta-Learning (paper) *This on-demand webinar features a previously recorded Q&A session and open captioning. Explore more Microsoft Research webinars