{"id":696090,"date":"2020-10-05T11:00:18","date_gmt":"2020-10-05T18:00:18","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&#038;p=696090"},"modified":"2025-08-06T11:52:24","modified_gmt":"2025-08-06T18:52:24","slug":"reinforcement-learning-day-2021","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day-2021\/","title":{"rendered":"Reinforcement Learning Day 2021"},"content":{"rendered":"\n\n<p><strong>This event has now concluded. On-demand content is available on the <\/strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day-2021\/#!videos\"><strong>Videos tab<\/strong><\/a>.<\/p>\n<p>Previous events:<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day-2019\/\">RL Day 2019<\/a><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day\/\">RL Day 2018<\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p>Reinforcement learning is the study of decision making with consequences over time. The topic draws together multi-disciplinary efforts from computer science, cognitive science, mathematics, economics, control theory, and neuroscience. The common thread through all of these studies is: how do natural and artificial systems learn to make decisions in complex environments based on external, and possibly delayed, feedback.<\/p>\n<p>This virtual workshop featured talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective was to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.<\/p>\n<div style=\"height: 20px\"><\/div>\n<h3>Committee Chairs<\/h3>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/akshaykr\/\">Akshay Krishnamurthy<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chinganc\/\">Ching-An Cheng<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dimisra\/\">Dipendra Misra<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/idamo\/\">Ida Momennejad<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/t-roloft\/\">Robert Loftin<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<\/p>\n<div style=\"height: 60px\"><\/div>\n<h3>Microsoft\u2019s Event Code of Conduct<\/h3>\n<p>Microsoft\u2019s mission is to empower every person and every organization on the planet to achieve more. This includes virtual events Microsoft hosts and participates in, where we seek to create a respectful, friendly, and inclusive experience for all participants. As such, we do not tolerate harassing or disrespectful behavior, messages, images, or interactions by any event participant, in any form, at any aspect of the program including business and social activities, regardless of location.<\/p>\n<p>We do not tolerate any behavior that is degrading to any gender, race, sexual orientation or disability, or any behavior that would violate <a href=\"https:\/\/www.microsoft.com\/en-us\/legal\/compliance\/default.aspx\">Microsoft\u2019s Anti-Harassment and Anti-Discrimination Policy, Equal Employment Opportunity Policy, or Standards of Business Conduct<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In short, the entire experience must meet our culture standards. We encourage everyone to assist in creating a welcoming and safe environment. Please <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/app.convercent.com\/en-us\/Anonymous\/IssueIntake\/LandingPage\/65d3b907-0933-e611-8105-000d3ab03673\">report<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> any concerns, harassing behavior, or suspicious or disruptive activity. Microsoft reserves the right to ask attendees to leave at any time at its sole discretion.<\/p>\n<div style=\"height: 20px\"><\/div>\n<div>\n\t<a\n\t\thref=\"https:\/\/app.convercent.com\/en-us\/Anonymous\/IssueIntake\/LandingPage\/65d3b907-0933-e611-8105-000d3ab03673\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"Report a concern\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\tReport a concern\t<\/a>\n\n\t<\/div>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<h3>This event has now concluded.<\/h3>\n<h2>Thursday,\u202fJanuary 14, 2021<\/h2>\n<table style=\"border-spacing: inherit;border-collapse: collapse;width: 100%;padding: 8px;text-align: left;border-bottom: 1px solid #000000\">\n<tbody>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Time (EST) <\/strong><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Session <\/strong><\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Speaker <\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">10:00 AM-10:15 AM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Welcome Remarks<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/08\/Akshay_Krishnamurthy_125x125.jpg\" alt=\"Portrait of Akshay Kristhnamurthy\" width=\"80\" height=\"80\" \/><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/akshaykr\/\">Akshay Krishnamurthy<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">10:15 AM-11:00 AM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">New Advances in Hierarchical Reinforcement Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/DoinaPrecup-Headshot.jpeg\" alt=\"Portrait of Doina Precup\" width=\"80\" height=\"80\" \/><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/www.cs.mcgill.ca\/~dprecup\/\">Doina Precup<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, McGill University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">11:00 AM-11:45 AM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Reinforcement Learning Debate: The State of RL and The Theory-Practice Divide<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/John-Langford_360x360-300x300.jpg\" alt=\"Portrait of John Langford\" width=\"80\" height=\"80\" \/><\/p>\n<div style=\"height: 8px\"><\/div>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/11\/YoshuaBengio_Headshot-PHDS-5fad9df67a570.jpg\" alt=\"Portrait of Yoshua Bengio\" width=\"80\" height=\"80\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<\/p>\n<div style=\"height: 8px\"><\/div>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\">Yoshua Bengio<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u202fMila (Quebec AI Institute)<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">11:45 AM-12:15 PM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Break<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">12:15 PM-1:45 PM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Virtual Poster Presentations<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Yunzong Xu, MIT<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Taylor Expansion Policy Optimization<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Yunhao Tang, Columbia University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Provably Efficient Policy Optimization with Thompson Sampling<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Haque Ishfaq, McGill University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Active Imitation Learning with Noisy Guidance<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Kiant\u00e9 Brantley, University of Maryland<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Finite-Time Analysis of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Sihan Zeng, Georgia Tech<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">META-Q-LEARNING<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Rasool Fakoor, Amazon Web Services<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Toward the Fundamental Limits of Imitation Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Nived Rajaraman, UC Berkeley<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Multitask Bandit Learning through Heterogeneous Feedback Aggregation<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Zhi Wang, UC San Diego<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">&#8220;It\u2019s Unwieldy and it Takes a Lot of Time.\u201d Challenges and Opportunities for Creating Agents in Commercial Games<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Mikhail Jacob, Microsoft Research, Cambridge UK<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">A Framework for Robust Learning and Control of Nonlinear Systems with Large Uncertainty<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Hoang Le, Microsoft Research, Redmond<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Learning Dynamic Belief Graphs to Generalize on Text-Based Games<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Eric Yuan, Microsoft Research, Montreal<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Frugal Optimization for Cost-Related Hyperparameters<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Qingyun Wu, Microsoft Research, NYC<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Denis Yarats, New York University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Self Supervised Policy Adaptation During Deployment<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Nicklas Hansen, Technical University of Denmark<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Multi-Task Reinforcement Learning with Soft Modularization<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Ruihan Yang, UC San Diego<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Rishabh Agarwal, Google Research, and Mila Research<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">A Regret Minimization Approach to Iterative Learning Control<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Karan Singh, Princeton University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">RMP2: A Differentiable Policy Class for Robotic Systems with Control-Theoretic Guarantees<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Anqi Li, University of Washington<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Generating Adversarial Disturbances for Controller Verification<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Udaya Ghai, Princeton University<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div style=\"height: 20px\"><\/div>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<h3>This event has now concluded.<\/h3>\n<h2>Call for virtual poster session<\/h2>\n<p>Reinforcement learning as a field that studies the problem of sequential decision making with unknown and potentially long-term consequences. Reinforcement learning is a multi-disciplinary topic, bringing together diverse fields of study including computer science, cognitive science, mathematics, psychology, economics, control theory, and neuroscience. The common theme that connects these fields, and the core goal of reinforcement learning is the question: <strong><em>How do natural and artificial systems learn to make decisions in complex, unknown environments based on limited, noisy, and possibly delayed feedback?<\/em><\/strong><\/p>\n<p>This virtual workshop aims to bring together researchers from industry and academia to share and discuss recent advances, challenges, and future research directions for reinforcement learning. Our goal is to highlight emerging research opportunities for the reinforcement learning community, particularly those driven by the evolving need for robust decision making in practical applications. Reinforcement Learning Day 2021 will provide an opportunity for different research communities to learn from each other and build on the latest knowledge in reinforcement learning and related disciplines.<\/p>\n<h3>Invited speakers<\/h3>\n<p>Reinforcement Learning Day 2021 will feature invited talks and conversations with leaders in the field, including <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\">Yoshua Bengio<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<\/a>, whose research covers a broad array of topics related to reinforcement learning. For more details please see the agenda page.<\/p>\n<h3>Virtual poster session<\/h3>\n<p>In addition to our speaker program, Reinforcement Learning Day 2021 will include a virtual poster session, showcasing recent and ongoing research in all areas of reinforcement learning.<\/p>\n<p>We invite you to submit posters on all topics related to reinforcement learning. Suggested topics include (but are certainly not limited to):<\/p>\n<ul>\n<li>Deep Reinforcement Learning<\/li>\n<li>Reinforcement Learning Theory<\/li>\n<li>Bandit Algorithms<\/li>\n<li>Multi-Agent Reinforcement Learning<\/li>\n<li>Reinforcement Learning Benchmarks and Datasets<\/li>\n<li>Reinforcement Learning with Natural Language<\/li>\n<li>Human-in-the-Loop Reinforcement Learning<\/li>\n<li>Imitation Learning<\/li>\n<li>Control Theory<\/li>\n<li>Cross-Disciplinary Research with Reinforcement Learning: Structured Prediction, Game Theory, Operation Research, Fairness, Active Learning, Causality, Privacy, etc.<\/li>\n<li>Applications of Reinforcement Learning: Recommender Systems, Robotics, Healthcare, Education, Conversational AI, Gaming, Finance, Neuroscience, Manufacturing etc.<\/li>\n<\/ul>\n<h3>What to submit<\/h3>\n<p>We invite the submission of extended abstracts (1-4 pages) on topics related to reinforcement learning. Authors of accepted abstracts will be invited to present their work at our virtual poster session (via Microsoft Teams), giving authors the opportunity for in-depth discussions with other Reinforcement Learning Day 2021 participants, presenters, and Microsoft researchers. Abstract reviewing will be single-blind. From the applications, we will be accepting 10-15 presenters only. Accepted presenters will be asked to prepare pre-recorded video presentations to complement the live discussion during the virtual poster session.<\/p>\n<p>Please submit your abstract to <a href=\"mailto:msrrlday@microsoft.com\">msrrlday@microsoft.com<\/a>.<\/p>\n<h3>Important dates<\/h3>\n<ul>\n<li>December 11, 2020: Abstract submission deadline<\/li>\n<li>December 22, 2020: Author notification<\/li>\n<li>January 14, 2021: Reinforcement Learning Day 2021 \u2013 virtual workshop!<\/li>\n<\/ul>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This virtual reinforcement learning workshop will feature talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control.<\/p>\n","protected":false},"featured_media":705232,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_startdate":"2021-01-14","msr_enddate":"2021-01-14","msr_location":"Virtual","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":false,"msr_private_event":false,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[13556],"msr-region":[256048],"msr-event-type":[197944],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-696090","msr-event","type-msr-event","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-region-global","msr-event-type-hosted-by-microsoft","msr-locale-en_us"],"msr_about":"<!-- wp:msr\/event-details {\"title\":\"Reinforcement Learning Day 2021\",\"backgroundColor\":\"grey\",\"image\":{\"id\":705232,\"url\":\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720.jpg\",\"alt\":\"\"},\"imageType\":\"full-bleed\"} \/-->\n\n<!-- wp:msr\/content-tabs --><!-- wp:msr\/content-tab {\"title\":\"About\"} --><!-- wp:freeform --><p><strong>This event has now concluded. On-demand content is available on the <\/strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day-2021\/#!videos\"><strong>Videos tab<\/strong><\/a>.<\/p>\n<p>Previous events:<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day-2019\/\">RL Day 2019<\/a><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day\/\">RL Day 2018<\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p>Reinforcement learning is the study of decision making with consequences over time. The topic draws together multi-disciplinary efforts from computer science, cognitive science, mathematics, economics, control theory, and neuroscience. The common thread through all of these studies is: how do natural and artificial systems learn to make decisions in complex environments based on external, and possibly delayed, feedback.<\/p>\n<p>This virtual workshop featured talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective was to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.<\/p>\n<div style=\"height: 20px\"><\/div>\n<h3>Committee Chairs<\/h3>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/akshaykr\/\">Akshay Krishnamurthy<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chinganc\/\">Ching-An Cheng<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dimisra\/\">Dipendra Misra<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/idamo\/\">Ida Momennejad<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/t-roloft\/\">Robert Loftin<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<\/p>\n<div style=\"height: 60px\"><\/div>\n<h3>Microsoft\u2019s Event Code of Conduct<\/h3>\n<p>Microsoft\u2019s mission is to empower every person and every organization on the planet to achieve more. This includes virtual events Microsoft hosts and participates in, where we seek to create a respectful, friendly, and inclusive experience for all participants. As such, we do not tolerate harassing or disrespectful behavior, messages, images, or interactions by any event participant, in any form, at any aspect of the program including business and social activities, regardless of location.<\/p>\n<p>We do not tolerate any behavior that is degrading to any gender, race, sexual orientation or disability, or any behavior that would violate <a href=\"https:\/\/www.microsoft.com\/en-us\/legal\/compliance\/default.aspx\">Microsoft\u2019s Anti-Harassment and Anti-Discrimination Policy, Equal Employment Opportunity Policy, or Standards of Business Conduct<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In short, the entire experience must meet our culture standards. We encourage everyone to assist in creating a welcoming and safe environment. Please <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/app.convercent.com\/en-us\/Anonymous\/IssueIntake\/LandingPage\/65d3b907-0933-e611-8105-000d3ab03673\">report<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> any concerns, harassing behavior, or suspicious or disruptive activity. Microsoft reserves the right to ask attendees to leave at any time at its sole discretion.<\/p>\n<div style=\"height: 20px\"><\/div>\n<div>\n\t<a\n\t\thref=\"https:\/\/app.convercent.com\/en-us\/Anonymous\/IssueIntake\/LandingPage\/65d3b907-0933-e611-8105-000d3ab03673\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"Report a concern\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\tReport a concern\t<\/a>\n\n\t<\/div>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- wp:msr\/content-tab {\"title\":\"Agenda\"} --><!-- wp:freeform --><h3>This event has now concluded.<\/h3>\n<h2>Thursday,\u202fJanuary 14, 2021<\/h2>\n<table style=\"border-spacing: inherit;border-collapse: collapse;width: 100%;padding: 8px;text-align: left;border-bottom: 1px solid #000000\">\n<tbody>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Time (EST) <\/strong><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Session <\/strong><\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Speaker <\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">10:00 AM-10:15 AM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Welcome Remarks<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/08\/Akshay_Krishnamurthy_125x125.jpg\" alt=\"Portrait of Akshay Kristhnamurthy\" width=\"80\" height=\"80\" \/><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/akshaykr\/\">Akshay Krishnamurthy<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">10:15 AM-11:00 AM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">New Advances in Hierarchical Reinforcement Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/DoinaPrecup-Headshot.jpeg\" alt=\"Portrait of Doina Precup\" width=\"80\" height=\"80\" \/><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/www.cs.mcgill.ca\/~dprecup\/\">Doina Precup<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, McGill University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">11:00 AM-11:45 AM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Reinforcement Learning Debate: The State of RL and The Theory-Practice Divide<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/John-Langford_360x360-300x300.jpg\" alt=\"Portrait of John Langford\" width=\"80\" height=\"80\" \/><\/p>\n<div style=\"height: 8px\"><\/div>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/11\/YoshuaBengio_Headshot-PHDS-5fad9df67a570.jpg\" alt=\"Portrait of Yoshua Bengio\" width=\"80\" height=\"80\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft Research<\/p>\n<div style=\"height: 8px\"><\/div>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\">Yoshua Bengio<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u202fMila (Quebec AI Institute)<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">11:45 AM-12:15 PM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Break<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">12:15 PM-1:45 PM<\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Virtual Poster Presentations<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Yunzong Xu, MIT<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Taylor Expansion Policy Optimization<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Yunhao Tang, Columbia University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Provably Efficient Policy Optimization with Thompson Sampling<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Haque Ishfaq, McGill University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Active Imitation Learning with Noisy Guidance<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Kiant\u00e9 Brantley, University of Maryland<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Finite-Time Analysis of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Sihan Zeng, Georgia Tech<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">META-Q-LEARNING<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Rasool Fakoor, Amazon Web Services<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Toward the Fundamental Limits of Imitation Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Nived Rajaraman, UC Berkeley<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Multitask Bandit Learning through Heterogeneous Feedback Aggregation<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Zhi Wang, UC San Diego<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">&#8220;It\u2019s Unwieldy and it Takes a Lot of Time.\u201d Challenges and Opportunities for Creating Agents in Commercial Games<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Mikhail Jacob, Microsoft Research, Cambridge UK<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">A Framework for Robust Learning and Control of Nonlinear Systems with Large Uncertainty<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Hoang Le, Microsoft Research, Redmond<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Learning Dynamic Belief Graphs to Generalize on Text-Based Games<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Eric Yuan, Microsoft Research, Montreal<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Frugal Optimization for Cost-Related Hyperparameters<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Qingyun Wu, Microsoft Research, NYC<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Denis Yarats, New York University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Self Supervised Policy Adaptation During Deployment<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Nicklas Hansen, Technical University of Denmark<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Multi-Task Reinforcement Learning with Soft Modularization<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Ruihan Yang, UC San Diego<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Rishabh Agarwal, Google Research, and Mila Research<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">A Regret Minimization Approach to Iterative Learning Control<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Karan Singh, Princeton University<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">RMP2: A Differentiable Policy Class for Robotic Systems with Control-Theoretic Guarantees<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Anqi Li, University of Washington<\/td>\n<\/tr>\n<tr>\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Generating Adversarial Disturbances for Controller Verification<\/td>\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Udaya Ghai, Princeton University<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div style=\"height: 20px\"><\/div>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- wp:msr\/content-tab {\"title\":\"Call for papers\"} --><!-- wp:freeform --><h3>This event has now concluded.<\/h3>\n<h2>Call for virtual poster session<\/h2>\n<p>Reinforcement learning as a field that studies the problem of sequential decision making with unknown and potentially long-term consequences. Reinforcement learning is a multi-disciplinary topic, bringing together diverse fields of study including computer science, cognitive science, mathematics, psychology, economics, control theory, and neuroscience. The common theme that connects these fields, and the core goal of reinforcement learning is the question: <strong><em>How do natural and artificial systems learn to make decisions in complex, unknown environments based on limited, noisy, and possibly delayed feedback?<\/em><\/strong><\/p>\n<p>This virtual workshop aims to bring together researchers from industry and academia to share and discuss recent advances, challenges, and future research directions for reinforcement learning. Our goal is to highlight emerging research opportunities for the reinforcement learning community, particularly those driven by the evolving need for robust decision making in practical applications. Reinforcement Learning Day 2021 will provide an opportunity for different research communities to learn from each other and build on the latest knowledge in reinforcement learning and related disciplines.<\/p>\n<h3>Invited speakers<\/h3>\n<p>Reinforcement Learning Day 2021 will feature invited talks and conversations with leaders in the field, including <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\">Yoshua Bengio<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<\/a>, whose research covers a broad array of topics related to reinforcement learning. For more details please see the agenda page.<\/p>\n<h3>Virtual poster session<\/h3>\n<p>In addition to our speaker program, Reinforcement Learning Day 2021 will include a virtual poster session, showcasing recent and ongoing research in all areas of reinforcement learning.<\/p>\n<p>We invite you to submit posters on all topics related to reinforcement learning. Suggested topics include (but are certainly not limited to):<\/p>\n<ul>\n<li>Deep Reinforcement Learning<\/li>\n<li>Reinforcement Learning Theory<\/li>\n<li>Bandit Algorithms<\/li>\n<li>Multi-Agent Reinforcement Learning<\/li>\n<li>Reinforcement Learning Benchmarks and Datasets<\/li>\n<li>Reinforcement Learning with Natural Language<\/li>\n<li>Human-in-the-Loop Reinforcement Learning<\/li>\n<li>Imitation Learning<\/li>\n<li>Control Theory<\/li>\n<li>Cross-Disciplinary Research with Reinforcement Learning: Structured Prediction, Game Theory, Operation Research, Fairness, Active Learning, Causality, Privacy, etc.<\/li>\n<li>Applications of Reinforcement Learning: Recommender Systems, Robotics, Healthcare, Education, Conversational AI, Gaming, Finance, Neuroscience, Manufacturing etc.<\/li>\n<\/ul>\n<h3>What to submit<\/h3>\n<p>We invite the submission of extended abstracts (1-4 pages) on topics related to reinforcement learning. Authors of accepted abstracts will be invited to present their work at our virtual poster session (via Microsoft Teams), giving authors the opportunity for in-depth discussions with other Reinforcement Learning Day 2021 participants, presenters, and Microsoft researchers. Abstract reviewing will be single-blind. From the applications, we will be accepting 10-15 presenters only. Accepted presenters will be asked to prepare pre-recorded video presentations to complement the live discussion during the virtual poster session.<\/p>\n<p>Please submit your abstract to <a href=\"mailto:msrrlday@microsoft.com\">msrrlday@microsoft.com<\/a>.<\/p>\n<h3>Important dates<\/h3>\n<ul>\n<li>December 11, 2020: Abstract submission deadline<\/li>\n<li>December 22, 2020: Author notification<\/li>\n<li>January 14, 2021: Reinforcement Learning Day 2021 \u2013 virtual workshop!<\/li>\n<\/ul>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- \/wp:msr\/content-tabs -->","tab-content":[{"id":0,"name":"About","content":"Reinforcement learning is the study of decision making with consequences over time. The topic draws together multi-disciplinary efforts from computer science, cognitive science, mathematics, economics, control theory, and neuroscience. The common thread through all of these studies is: how do natural and artificial systems learn to make decisions in complex environments based on external, and possibly delayed, feedback.\r\n\r\nThis virtual workshop featured talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective was to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.\r\n<div style=\"height: 20px\"><\/div>\r\n<h3>Committee Chairs<\/h3>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/akshaykr\/\">Akshay Krishnamurthy<\/a>, Microsoft Research\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chinganc\/\">Ching-An Cheng<\/a>, Microsoft Research\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dimisra\/\">Dipendra Misra<\/a>, Microsoft Research\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/idamo\/\">Ida Momennejad<\/a>, Microsoft Research\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/t-roloft\/\">Robert Loftin<\/a>, Microsoft Research\r\n<div style=\"height: 60px\"><\/div>\r\n<h3>Microsoft\u2019s Event Code of Conduct<\/h3>\r\nMicrosoft\u2019s mission is to empower every person and every organization on the planet to achieve more. This includes virtual events Microsoft hosts and participates in, where we seek to create a respectful, friendly, and inclusive experience for all participants. As such, we do not tolerate harassing or disrespectful behavior, messages, images, or interactions by any event participant, in any form, at any aspect of the program including business and social activities, regardless of location.\r\n\r\nWe do not tolerate any behavior that is degrading to any gender, race, sexual orientation or disability, or any behavior that would violate <a href=\"https:\/\/www.microsoft.com\/en-us\/legal\/compliance\/default.aspx\">Microsoft\u2019s Anti-Harassment and Anti-Discrimination Policy, Equal Employment Opportunity Policy, or Standards of Business Conduct<\/a>. In short, the entire experience must meet our culture standards. We encourage everyone to assist in creating a welcoming and safe environment. Please <a href=\"https:\/\/app.convercent.com\/en-us\/Anonymous\/IssueIntake\/LandingPage\/65d3b907-0933-e611-8105-000d3ab03673\">report<\/a> any concerns, harassing behavior, or suspicious or disruptive activity. Microsoft reserves the right to ask attendees to leave at any time at its sole discretion.\r\n<div style=\"height: 20px\"><\/div>\r\n<div>[msr-button text=\"Report a concern\" url=\"https:\/\/app.convercent.com\/en-us\/Anonymous\/IssueIntake\/LandingPage\/65d3b907-0933-e611-8105-000d3ab03673\" new-window=\"true\" ]<\/div>"},{"id":1,"name":"Agenda","content":"<h3>This event has now concluded.<\/h3>\r\n<h2>Thursday,\u202fJanuary 14, 2021<\/h2>\r\n<table style=\"border-spacing: inherit;border-collapse: collapse;width: 100%;padding: 8px;text-align: left;border-bottom: 1px solid #000000\">\r\n<tbody>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Time (EST) <\/strong><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Session <\/strong><\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><strong> Speaker <\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">10:00 AM-10:15 AM<\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Welcome Remarks<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/08\/Akshay_Krishnamurthy_125x125.jpg\" alt=\"Portrait of Akshay Kristhnamurthy\" width=\"80\" height=\"80\" \/><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/akshaykr\/\">Akshay Krishnamurthy<\/a>, Microsoft Research<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">10:15 AM-11:00 AM<\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">New Advances in Hierarchical Reinforcement Learning<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/01\/DoinaPrecup-Headshot.jpeg\" alt=\"Portrait of Doina Precup\" width=\"80\" height=\"80\" \/><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a href=\"https:\/\/www.cs.mcgill.ca\/~dprecup\/\">Doina Precup<\/a>, McGill University<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">11:00 AM-11:45 AM<\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Reinforcement Learning Debate: The State of RL and The Theory-Practice Divide<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><img class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/John-Langford_360x360-300x300.jpg\" alt=\"Portrait of John Langford\" width=\"80\" height=\"80\" \/>\r\n<div style=\"height: 8px\"><\/div>\r\n<a href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\"><img class=\"avatar avatar-180 photo msr-profile-image aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/11\/YoshuaBengio_Headshot-PHDS-5fad9df67a570.jpg\" alt=\"Portrait of Yoshua Bengio\" width=\"80\" height=\"80\" \/><\/a><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<\/a>, Microsoft Research\r\n<div style=\"height: 8px\"><\/div>\r\n<a href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\">Yoshua Bengio<\/a>,\u202fMila (Quebec AI Institute)<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">11:45 AM-12:15 PM<\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Break<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\">12:15 PM-1:45 PM<\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Virtual Poster Presentations<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Yunzong Xu, MIT<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Taylor Expansion Policy Optimization<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Yunhao Tang, Columbia University<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Provably Efficient Policy Optimization with Thompson Sampling<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Haque Ishfaq, McGill University<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Active Imitation Learning with Noisy Guidance<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Kiant\u00e9 Brantley, University of Maryland<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Finite-Time Analysis of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Sihan Zeng, Georgia Tech<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">META-Q-LEARNING<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Rasool Fakoor, Amazon Web Services<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Toward the Fundamental Limits of Imitation Learning<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Nived Rajaraman, UC Berkeley<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Multitask Bandit Learning through Heterogeneous Feedback Aggregation<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Zhi Wang, UC San Diego<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">\"It\u2019s Unwieldy and it Takes a Lot of Time.\u201d Challenges and Opportunities for Creating Agents in Commercial Games<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Mikhail Jacob, Microsoft Research, Cambridge UK<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">A Framework for Robust Learning and Control of Nonlinear Systems with Large Uncertainty<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Hoang Le, Microsoft Research, Redmond<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Learning Dynamic Belief Graphs to Generalize on Text-Based Games<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Eric Yuan, Microsoft Research, Montreal<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Frugal Optimization for Cost-Related Hyperparameters<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Qingyun Wu, Microsoft Research, NYC<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Denis Yarats, New York University<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Self Supervised Policy Adaptation During Deployment<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Nicklas Hansen, Technical University of Denmark<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Multi-Task Reinforcement Learning with Soft Modularization<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Ruihan Yang, UC San Diego<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Rishabh Agarwal, Google Research, and Mila Research<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">A Regret Minimization Approach to Iterative Learning Control<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Karan Singh, Princeton University<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">RMP2: A Differentiable Policy Class for Robotic Systems with Control-Theoretic Guarantees<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Anqi Li, University of Washington<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"vertical-align: middle;width: 20%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 35%;padding: 8px;border-bottom: 1px solid #000000\">Generating Adversarial Disturbances for Controller Verification<\/td>\r\n<td style=\"vertical-align: middle;width: 15%;padding: 8px;border-bottom: 1px solid #000000\"><\/td>\r\n<td style=\"vertical-align: middle;width: 30%;padding: 8px;border-bottom: 1px solid #000000\">Udaya Ghai, Princeton University<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<div style=\"height: 20px\"><\/div>"},{"id":2,"name":"Call for papers","content":"<h3>This event has now concluded.<\/h3>\r\n<h2>Call for virtual poster session<\/h2>\r\nReinforcement learning as a field that studies the problem of sequential decision making with unknown and potentially long-term consequences. Reinforcement learning is a multi-disciplinary topic, bringing together diverse fields of study including computer science, cognitive science, mathematics, psychology, economics, control theory, and neuroscience. The common theme that connects these fields, and the core goal of reinforcement learning is the question: <strong><em>How do natural and artificial systems learn to make decisions in complex, unknown environments based on limited, noisy, and possibly delayed feedback?<\/em><\/strong>\r\n\r\nThis virtual workshop aims to bring together researchers from industry and academia to share and discuss recent advances, challenges, and future research directions for reinforcement learning. Our goal is to highlight emerging research opportunities for the reinforcement learning community, particularly those driven by the evolving need for robust decision making in practical applications. Reinforcement Learning Day 2021 will provide an opportunity for different research communities to learn from each other and build on the latest knowledge in reinforcement learning and related disciplines.\r\n<h3>Invited speakers<\/h3>\r\nReinforcement Learning Day 2021 will feature invited talks and conversations with leaders in the field, including <a href=\"https:\/\/yoshuabengio.org\/\" target=\"_blank\" rel=\"noopener\">Yoshua Bengio<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<\/a>, whose research covers a broad array of topics related to reinforcement learning. For more details please see the agenda page.\r\n<h3>Virtual poster session<\/h3>\r\nIn addition to our speaker program, Reinforcement Learning Day 2021 will include a virtual poster session, showcasing recent and ongoing research in all areas of reinforcement learning.\r\n\r\nWe invite you to submit posters on all topics related to reinforcement learning. Suggested topics include (but are certainly not limited to):\r\n<ul>\r\n \t<li>Deep Reinforcement Learning<\/li>\r\n \t<li>Reinforcement Learning Theory<\/li>\r\n \t<li>Bandit Algorithms<\/li>\r\n \t<li>Multi-Agent Reinforcement Learning<\/li>\r\n \t<li>Reinforcement Learning Benchmarks and Datasets<\/li>\r\n \t<li>Reinforcement Learning with Natural Language<\/li>\r\n \t<li>Human-in-the-Loop Reinforcement Learning<\/li>\r\n \t<li>Imitation Learning<\/li>\r\n \t<li>Control Theory<\/li>\r\n \t<li>Cross-Disciplinary Research with Reinforcement Learning: Structured Prediction, Game Theory, Operation Research, Fairness, Active Learning, Causality, Privacy, etc.<\/li>\r\n \t<li>Applications of Reinforcement Learning: Recommender Systems, Robotics, Healthcare, Education, Conversational AI, Gaming, Finance, Neuroscience, Manufacturing etc.<\/li>\r\n<\/ul>\r\n<h3>What to submit<\/h3>\r\nWe invite the submission of extended abstracts (1-4 pages) on topics related to reinforcement learning. Authors of accepted abstracts will be invited to present their work at our virtual poster session (via Microsoft Teams), giving authors the opportunity for in-depth discussions with other Reinforcement Learning Day 2021 participants, presenters, and Microsoft researchers. Abstract reviewing will be single-blind. From the applications, we will be accepting 10-15 presenters only. Accepted presenters will be asked to prepare pre-recorded video presentations to complement the live discussion during the virtual poster session.\r\n\r\nPlease submit your abstract to <a href=\"mailto:msrrlday@microsoft.com\">msrrlday@microsoft.com<\/a>.\r\n<h3>Important dates<\/h3>\r\n<ul>\r\n \t<li>December 11, 2020: Abstract submission deadline<\/li>\r\n \t<li>December 22, 2020: Author notification<\/li>\r\n \t<li>January 14, 2021: Reinforcement Learning Day 2021 \u2013 virtual workshop!<\/li>\r\n<\/ul>"}],"msr_startdate":"2021-01-14","msr_enddate":"2021-01-14","msr_event_time":"","msr_location":"Virtual","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"January 14, 2021","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720-960x540.jpg\" class=\"img-object-cover\" alt=\"Reinforcement Learning Day header: mix of AI icons on a blue background\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/10\/RLDay_AI_header_11-2020_1920x720-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","event_excerpt":"This virtual reinforcement learning workshop will feature talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control.","msr_research_lab":[199571],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[395930],"related-projects":[],"related-opportunities":[],"related-publications":[],"related-videos":[718696],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/696090","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":22,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/696090\/revisions"}],"predecessor-version":[{"id":1146930,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/696090\/revisions\/1146930"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/705232"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=696090"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=696090"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=696090"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=696090"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=696090"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=696090"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=696090"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=696090"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=696090"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}