{"id":709156,"date":"2020-12-07T07:55:00","date_gmt":"2020-12-07T15:55:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=709156"},"modified":"2021-06-24T14:20:48","modified_gmt":"2021-06-24T21:20:48","slug":"research-collection-reinforcement-learning-at-microsoft","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/research-collection-reinforcement-learning-at-microsoft\/","title":{"rendered":"Research Collection \u2013 Reinforcement Learning at Microsoft"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote alignwide has-text-align-wide is-layout-flow wp-block-quote-is-layout-flow\"><p><em>Reinforcement learning is about agents taking information from the world and learning a policy for interacting with it, so that they perform better. So, you can imagine a future where, every time you type on the keyboard, the keyboard learns to understand you better. Or every time you interact with some website, it understands better what your preferences are, so the world just starts working better and better at interacting with people.<\/em><\/p><cite>John Langford, Partner Research Manager, MSR NYC<\/cite><\/blockquote>\n\n\n\n<p>Fundamentally, reinforcement learning (RL) is an approach to machine learning in which a software agent interacts with its environment, receives rewards, and chooses actions that will maximize those rewards. Research on reinforcement learning goes back many decades and is rooted in work in many different fields, including animal psychology, and some of its basic concepts were explored in the earliest research on artificial intelligence \u2013 such as Marvin Minsky\u2019s 1951 SNARC machine, which used an ancestor of modern reinforcement learning techniques to simulate a rat solving a maze.<\/p>\n\n\n\n<p>In the 1990s and 2000s, theoretical and practical work in reinforcement learning began to accelerate, leading to the rapid progress we see today. The theory behind reinforcement learning continues to advance, while its applications in real-world scenarios are leading to meaningful impact in many areas \u2013 from training autonomous systems to operate more safely and reliably in real-world environments, to making games more engaging and entertaining, to delivering more personalized information and experiences on the web.<\/p>\n\n\n\n<p>Below is a timeline of advances that researchers and their collaborators across Microsoft have made in reinforcement learning, along with <em>key milestones<\/em> in the field generally.<\/p>\n\n\n\n<h3 id=\"foundational-work-in-reinforcement-learning-1992-2014\" class=\"moment__title\">Foundational work in reinforcement learning (1992-2014)<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><em>In 1992, this <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/dl.acm.org\/doi\/10.1007\/BF00992696\" target=\"_blank\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and its Reinforce algorithm were instrumental in the development of policy optimization algorithms.<\/em><\/li><li><em>This 1995 <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/www.researchgate.net\/profile\/Nicolo_Cesa-Bianchi\/publication\/2265004_Gambling_in_a_rigged_casino_The_adversarial_multi-armed_bandit_problem\/links\/0fcfd5131565c9740c000000\/Gambling-in-a-rigged-casino-The-adversarial-multi-armed-bandit-problem.pdf\" target=\"_blank\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (and a later <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/rob.schapire.net\/papers\/AuerCeFrSc01.pdf\" target=\"_blank\">journal version<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) presented a novel approach to solving the \u201cmultiarmed bandit problem\u201d without making any statistical assumptions about the distribution of payoffs.<\/em><\/li><li><em>This 1998 <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/www.cis.upenn.edu\/~mkearns\/papers\/KearnsSinghE3.pdf\" target=\"_blank\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (and a later <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.cis.upenn.edu\/~mkearns\/papers\/KearnsSinghE3.pdf\">journal ve<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/www.cis.upenn.edu\/~mkearns\/papers\/KearnsSinghE3.pdf\" target=\"_blank\">r<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.cis.upenn.edu\/~mkearns\/papers\/KearnsSinghE3.pdf\">sion<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) show how to learn optimal behavior in solving Markov Decision Processes generally.<\/em><\/li><li><em>This 2002 <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/www.cs.cmu.edu\/~.\/jcl\/papers\/aoarl\/Final.pdf\" target=\"_blank\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> showed the first conditions under which learning to improve a policy locally achieves optimal policies.<\/em><\/li><li><em>In 2007, bandits that are generalized to use features and context are named <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/proceedings.neurips.cc\/paper\/2007\/file\/4b04a686b0ad13dce35fa99fa4161c65-Paper.pdf\">contextual bandits<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/em><\/li><li><em>Also in 2007, the first public version of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/vowpalwabbit.org\/\" target=\"_blank\">Vowpal Wabbit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is released, offering fast, efficient and flexible online machine learning techniques, as well as other machine learning approaches. John Langford and several of his colleagues on this project later join Microsoft Research to continue their work.<\/em><\/li><li><em>Microsoft researcher <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<\/a> presents a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/hunch.net\/~jl\/interact.pdf\" target=\"_blank\">tutorial<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> on interactive learning at the Neural Information Processing Systems conference. (NIPS 2013)<\/em><\/li><li><em>In 2014, Richard Sutton and Andrew Barto publish <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/web.stanford.edu\/class\/psych209\/Readings\/SuttonBartoIPRLBook2ndEd.pdf\" target=\"_blank\">Reinforcement Learning: An Introduction<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, recounting work in the field that began in the late 1970s.<\/em><\/li><\/ul>\n\n\n\t<div class=\"wp-block-msr-block-journey journey journey--date alignwide\" data-bi-aN=\"block-journey\">\n\t\t<ol class=\"journey__list\">\n\t\t\t\n\t<li class=\"wp-block-msr-block-moment moment has-date\" data-bi-aN=\"block-moment\">\n\t\t<div class=\"moment__dot moment__dot--start\" role=\"presentation\"><\/div>\n\t\t<div role=\"presentation\"><\/div>\n\t\t<div class=\"moment__details\">\n\t\t\t\t\t\t<div class=\"moment__counter\"><\/div>\n\t\t\t\t\t\t\t<div class=\"moment__date-year\">\n\t\t\t\t\t2016\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t<div class=\"moment__content\">\n\t\t\t\n\n<h3 id=\"work-begins-on-project-malmo\" class=\"moment__title\">Work begins on Project Malmo<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"247\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/malmo_human_ai_interaction-web.png\" alt=\"Project malmo\" class=\"wp-image-235757\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/malmo_human_ai_interaction-web.png 800w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/malmo_human_ai_interaction-web-300x93.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/malmo_human_ai_interaction-web-768x237.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure>\n\n\n\n<p>Researchers at Microsoft Research Cambridge introduce the Malmo Platform for Artificial Intelligence Experimentation (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-malmo\/\">Project Malmo<\/a>), which uses Minecraft as a platform to help AI learn to make sense of complex environments, learn from others, interact with the world, learn transferable skills and apply them to solve new problems.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/malmo-platform-artificial-intelligence-experimentation\/\" data-bi-cN=\"The Malmo Platform for Artificial Intelligence Experimentation\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>The Malmo Platform for Artificial Intelligence Experimentation<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Tool<\/span>\n\t\t\t<a href=\"https:\/\/github.com\/Microsoft\/malmo\" data-bi-cN=\"Malmo\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Malmo<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\t\t<\/div>\n\t\t<div class=\"moment__dot moment__dot--end\" role=\"presentation\"><\/div>\n\t<\/li>\n\t\n\n\t<li class=\"wp-block-msr-block-moment moment has-date\" data-bi-aN=\"block-moment\">\n\t\t<div class=\"moment__dot moment__dot--start\" role=\"presentation\"><\/div>\n\t\t<div role=\"presentation\"><\/div>\n\t\t<div class=\"moment__details\">\n\t\t\t\t\t\t<div class=\"moment__counter\"><\/div>\n\t\t\t\t\t\t\t<div class=\"moment__date-year\">\n\t\t\t\t\t2017\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t<div class=\"moment__content\">\n\t\t\t\n\n<h3 id=\"airsim-for-real-world-rl\" class=\"moment__title\">AirSim for real world RL<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"225\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/02\/DroneGIF-02a.gif\" alt=\"drone animation\" class=\"wp-image-363695\" \/><\/figure>\n\n\n\n<p>Microsoft researchers begin work on the Aerial Informatics and Robotics Platform (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/aerial-informatics-robotics-platform\/\">AirSim<\/a>), an open-source robotics simulation platform that designers can use to generate the massive datasets required to train ground vehicles, wheeled robotics, aerial drones and other devices \u2013 without costly real-world field operations.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/autonomous-systems-aerial-robotics-and-game-of-drones-with-gurdeep-pall-and-dr-ashish-kapoor\/\" data-bi-cN=\"Autonomous systems, aerial robotics and Game of Drones with Gurdeep Pall and Dr. Ashish Kapoor\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Autonomous systems, aerial robotics and Game of Drones with Gurdeep Pall and Dr. Ashish Kapoor<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/toward-ai-operates-real-world\/\" data-bi-cN=\"Toward AI that operates in the real world\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Toward AI that operates in the real world<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p><\/p>\n<\/div>\n<\/div>\n\n\n\n<h3 id=\"hybrid-reward-architecture-wins-ms-pac-man\">Hybrid Reward Architecture wins Ms. Pac-Man<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"582\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/06\/download-1024x582.gif\" alt=\"Hybrid Reward Architecture for Ms. Pac-Man\" class=\"wp-image-455751\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/06\/download-1024x582.gif 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/06\/download-300x171.gif 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/06\/download-768x437.gif 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/hybrid-reward-architecture\/\">Hybrid Reward Architecture<\/a> project is established, combining standard reinforcement learning techniques with deep neural networks, with the aim of outperforming humans in Arcade Learning Environment (ALE) games. It achieves a perfect score on Ms. Pac-Man.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/hybrid-reward-architecture-achieving-super-human-ms-pac-man-performance\/\" data-bi-cN=\"Hybrid Reward Architecture (HRA) Achieving super-human performance on Ms. Pac-Man\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Hybrid Reward Architecture (HRA) Achieving super-human performance on Ms. Pac-Man<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/hybrid-reward-architecture-reinforcement-learning\/\" data-bi-cN=\"Hybrid Reward Architecture for Reinforcement Learning\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Hybrid Reward Architecture for Reinforcement Learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/hybrid-reward-architecture-fall-ms-pac-man-dr-harm-van-seijen\/\" data-bi-cN=\"Hybrid Reward Architecture and the Fall of Ms. Pac-Man with Dr. Harm van Seijen\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Hybrid Reward Architecture and the Fall of Ms. Pac-Man with Dr. Harm van Seijen<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\t\t<\/div>\n\t\t<div class=\"moment__dot moment__dot--end\" role=\"presentation\"><\/div>\n\t<\/li>\n\t\n\n\t<li class=\"wp-block-msr-block-moment moment has-date\" data-bi-aN=\"block-moment\">\n\t\t<div class=\"moment__dot moment__dot--start\" role=\"presentation\"><\/div>\n\t\t<div role=\"presentation\"><\/div>\n\t\t<div class=\"moment__details\">\n\t\t\t\t\t\t<div class=\"moment__counter\"><\/div>\n\t\t\t\t\t\t\t<div class=\"moment__date-year\">\n\t\t\t\t\t2018\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t<div class=\"moment__content\">\n\t\t\t\n\n<h3 id=\"bonsai-rl-for-autonomous-systems\" class=\"moment__title\">Bonsai: RL for autonomous systems<\/h3>\n\n\n\n<p>Microsoft <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/blogs.microsoft.com\/blog\/2018\/06\/20\/microsoft-to-acquire-bonsai-in-move-to-build-brains-for-autonomous-systems\/\" target=\"_blank\">acquires<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Bonsai, which developed a novel \u201cmachine teaching\u201d approach, based on reinforcement learning, that abstracts its low-level mechanics. This enables subject matter experts to specify and train autonomous systems to accomplish tasks, regardless of their AI experience.<\/p>\n\n\n\n<h3 id=\"teaching-agents-language-decision-making-using-games\">Teaching agents language, decision-making using games<\/h3>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Tool<\/span>\n\t\t\t<a href=\"https:\/\/github.com\/microsoft\/textworld\" data-bi-cN=\"TextWorld\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>TextWorld<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>Microsoft Research Montreal researchers introduce <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/textworld\/\">TextWorld<\/a>, an open-source, extensible engine that generates and simulates text games. This can be used to train reinforcement learning agents to learn skills such as language understanding and grounding, as well as sequential decision-making.<\/p>\n\n\t\t<\/div>\n\t\t<div class=\"moment__dot moment__dot--end\" role=\"presentation\"><\/div>\n\t<\/li>\n\t\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/textworld-a-learning-environment-for-training-reinforcement-learning-agents-inspired-by-text-based-games\/\" data-bi-cN=\"TextWorld: A learning environment for training reinforcement learning agents, inspired by text-based games\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>TextWorld: A learning environment for training reinforcement learning agents, inspired by text-based games<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/textworld-a-learning-environment-for-text-based-games\/\" data-bi-cN=\"TextWorld: A Learning Environment for Text-based Games\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>TextWorld: A Learning Environment for Text-based Games<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>Podcast: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/malmo-minecraft-and-machine-learning-with-dr-katja-hofmann\/\">Malmo, Minecraft and machine learning with Dr. Katja Hofmann<\/a><br><\/p>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/content.blubrry.com\/microsoftresearch\/msr_hofmann_036.mp3\"><\/audio><\/figure>\n\n\n\n<p class=\"has-small-font-size\"><em>Podcast excerpt: \u201cI look at how artificial agents can learn to interact with complex environments. And I\u2019m particularly excited about possibilities of those environments being ones where they interact with humans. So, one area is, for example, in video games, where AI agents that learn to interact intelligently could really enrich video games and create new types of experiences. For example, learn directly from their interactions with players, remember what kinds of interactions they\u2019ve had and be really more relatable and more responsive to what is actually going on in the game and how they\u2019re interacting with the player.\u201d Katja Hofmann, Principal Researcher, Microsoft Research Cambridge.<\/em><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/project-malmo-reinforcement-learning-in-a-complex-world\/\" data-bi-cN=\"Project Malmo: Reinforcement learning in a complex world\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Project Malmo: Reinforcement learning in a complex world<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/challenge-accepted-marlo-competition-among-conference-highlights\/\" data-bi-cN=\"Challenge accepted - MARL\u00d6 competition among conference highlight\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Challenge accepted &#8211; MARL\u00d6 competition among conference highlight<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 id=\"microsoft-research-asia-applies-rl-to-industry-challenges\">Microsoft Research Asia applies RL to industry challenges<\/h3>\n\n\n\n<p>Through deep engagement with customers in logistics, telecommunications, finance and other industries, researchers at Microsoft Research Asia worked to abstract many of their business tasks as a common sequential decision-making problem with interactive objects and large-scale optimization space. This led to a unified service, powered by multi-agent deep reinforcement learning, designed to respond to customers\u2019 requests for AI solutions.<\/p>\n\n\n\n<p>This unified service consists of a cooperative policy learning framework with pre-trained heterogeneous representations and an optimization framework for graph sampling over imbalanced data. Customers have applied this service to successfully solve their real-world problems, such as resource repositioning, capacity provisioning, and portfolio rebalancing.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">News<\/span>\n\t\t\t<a href=\"https:\/\/news.microsoft.com\/apac\/features\/ai-and-cargo-shipping-full-speed-ahead-for-global-maritime-trade\/\" data-bi-cN=\"AI and cargo shipping: Full speed ahead for global maritime trade\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>AI and cargo shipping: Full speed ahead for global maritime trade<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">News<\/span>\n\t\t\t<a href=\"https:\/\/news.microsoft.com\/apac\/2018\/04\/23\/msra-and-oocl-embrace-ai-in-digital-transformation\/\" data-bi-cN=\"MSRA and OOCL embrace AI in digital transformation\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>MSRA and OOCL embrace AI in digital transformation<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\t<li class=\"wp-block-msr-block-moment moment has-date\" data-bi-aN=\"block-moment\">\n\t\t<div class=\"moment__dot moment__dot--start\" role=\"presentation\"><\/div>\n\t\t<div role=\"presentation\"><\/div>\n\t\t<div class=\"moment__details\">\n\t\t\t\t\t\t<div class=\"moment__counter\"><\/div>\n\t\t\t\t\t\t\t<div class=\"moment__date-year\">\n\t\t\t\t\t2019\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t<div class=\"moment__content\">\n\t\t\t\n\n<h3 id=\"microsoft-launches-azure-cognitive-services-personalizer\" class=\"moment__title\">Microsoft launches Azure Cognitive Services Personalizer<\/h3>\n\n\n\n<p>Microsoft researchers establish the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/real-world-reinforcement-learning\/\">Real World Reinforcement Learning<\/a> project, with the goal of enabling the next generation of machine learning using interactive reinforcement-based approaches to solve real-world problems.<\/p>\n\n\n\n<p>One result of this work is the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/personalizer\/\" target=\"_blank\">Azure Cognitive Services Personalizer<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, built on Microsoft Research\u2019s <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/custom-decision\/\">Custom Decision Service<\/a> and also supported by Vowpal Wabbit. In addition to its availability to the developer community, it is <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/real-world-reinforcement-learning\/#!incubations\">used<\/a> by many teams at Microsoft, including Xbox, MSN, Microsoft.com and the Experiences & Devices division.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/real-world-interactive-learning-cusp-enabling-new-class-applications\/\" data-bi-cN=\"Real world interactive learning at cusp of enabling new class of applications\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Real world interactive learning at cusp of enabling new class of applications<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Tutorial<\/span>\n\t\t\t<a href=\"https:\/\/hunch.net\/~rwil\/\" data-bi-cN=\"ICML 2017 Tutorial on Real World Interactive Learning\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>ICML 2017 Tutorial on Real World Interactive Learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 id=\"game-of-drones-competition\">Game of Drones competition<\/h3>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/game-of-drones-at-neurips-2019-simulation-based-drone-racing-competition-built-on-airsim\/\" data-bi-cN=\"Game of Drones at NeurIPS 2019: Simulation-based drone-facing competition built on AirSim\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Game of Drones at NeurIPS 2019: Simulation-based drone-facing competition built on AirSim<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-1024x576.png\" alt=\"Image from Game of Drones simulation\" class=\"wp-image-624957\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>At NeurIPS, Microsoft researchers host the first \u201cGame of Drones\u201d competition, in which teams race a quadrotor drone in AirSim to push the boundaries of building competitive autonomous systems. The competition focuses on trajectory planning and control, computer vision, and opponent drone avoidance.<\/p>\n\n\n\n<h3 id=\"project-paidia-established\">Project Paidia established<\/h3>\n\n\n\n<p>Microsoft Research Cambridge and game developer Ninja Theory establish <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-paidia\/\">Project Paidia<\/a>, to drive state-of-the-art research in reinforcement learning aimed at novel applications in modern video games. Specifically, its early work focuses on creating agents that learn to collaborate with human players.<\/p>\n\n\n\n<p>Also in 2019:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Airsim is <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-airsim-now-available-on-unity\/\">released<\/a> on the Unity platform.<\/li><li>Microsoft holds its first MineRL competition on sample-efficient reinforcement learning, in which participants attempt to mine a diamond in Minecraft using only four days of training time. The top solutions are recounted in this 2020 <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/retrospective-analysis-of-the-2019-minerl-competition-on-sample-efficient-reinforcement-learning\/\">paper<\/a>.<\/li><\/ul>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/three-new-reinforcement-learning-methods-aim-to-improve-ai-in-gaming-and-beyond\/\" data-bi-cN=\"Three new reinforcement learning methods aim to improve AI in gaming and beyond - Microsoft Research\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Three new reinforcement learning methods aim to improve AI in gaming and beyond &#8211; Microsoft Research<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/reinforcement-learning-for-the-real-world-with-dr-john-langford-and-rafah-hosn\/\" data-bi-cN=\"Reinforcement learning for the real world with Dr. John Langford and Rafah Hosn\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Reinforcement learning for the real world with Dr. John Langford and Rafah Hosn<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Video<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/video\/minerl-competition-2019\/\" data-bi-cN=\"MineRL Competition 2019\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>MineRL Competition 2019<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Webinar<\/span>\n\t\t\t<a href=\"https:\/\/note.microsoft.com\/MSR-Webinar-RL-Algorithm-to-Adoption-Registration-On-Demand.html\" data-bi-cN=\"Exploring Reinforcement Learning Methods from Algorithm to Application\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Exploring Reinforcement Learning Methods from Algorithm to Application<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Webinar<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/reinforcement-learning-for-the-real-world-with-dr-john-langford-and-rafah-hosn\/\" data-bi-cN=\"Reinforcement learning for the real world with Dr. John Langford and Rafah Hosn\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Reinforcement learning for the real world with Dr. John Langford and Rafah Hosn<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 id=\"suphx-uses-rl-to-outperform-human-players-in-mahjong\">Suphx uses RL to outperform human players in Mahjong<\/h3>\n\n\n\n<p>Mahjong is a popular multi-player imperfect-information game, but it is very challenging for AI research due to its complex playing\/scoring rules and rich hidden information. In 2019, Microsoft researchers designed Super Phoenix (Suphx), an AI for Mahjong based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation. Suphx has demonstrated stronger performance than most top human players in terms of stable rank and is rated above 99.99% of all the officially ranked human players in the Tenhou platform. This is the first time that a computer program outperformed most top human players in Mahjong.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">News<\/span>\n\t\t\t<a href=\"https:\/\/news.microsoft.com\/apac\/features\/mastering-mahjong-with-ai-and-machine-learning\/\" data-bi-cN=\"More than a game: Mastering Mahjong with AI and machine learning\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>More than a game: Mastering Mahjong with AI and machine learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Project<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/suphx-mastering-mahjong-with-deep-reinforcement-learning\/\" data-bi-cN=\"Suphx: The World Best Mahjong AI\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Suphx: The World Best Mahjong AI<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\t\t<\/div>\n\t\t<div class=\"moment__dot moment__dot--end\" role=\"presentation\"><\/div>\n\t<\/li>\n\t\n\n\t<li class=\"wp-block-msr-block-moment moment has-date\" data-bi-aN=\"block-moment\">\n\t\t<div class=\"moment__dot moment__dot--start\" role=\"presentation\"><\/div>\n\t\t<div role=\"presentation\"><\/div>\n\t\t<div class=\"moment__details\">\n\t\t\t\t\t\t<div class=\"moment__counter\"><\/div>\n\t\t\t\t\t\t\t<div class=\"moment__date-year\">\n\t\t\t\t\t2020\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t<div class=\"moment__content\">\n\t\t\t\n\n<h3 id=\"\" class=\"moment__title\"><\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/1gew6o3qn6vx9kp3s42ge0y1-wpengine.netdna-ssl.com\/wp-content\/uploads\/prod\/sites\/171\/2020\/05\/moab-photo-768x432.jpg\" alt=\"\" \/><\/figure>\n\n\n\n<p>At Microsoft Build, the company <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/blogs.microsoft.com\/ai-for-business\/build-bonsai-public-preview\/\" target=\"_blank\">makes Project Bonsai available for public preview<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and introduces the Moab robotics platform for developers to test its capabilities.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Podcast<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/provably-efficient-reinforcement-learning-with-dr-akshay-krishnamurthy\/\" data-bi-cN=\"Provably efficient reinforcement learning with Dr. Akshay Krishnamurthy\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Provably efficient reinforcement learning with Dr. Akshay Krishnamurthy<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"\" data-bi-cN=\"Microsoft holds its second MineRL competition\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Microsoft holds its second MineRL competition<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/aka.ms\/MSRBlogRLNeurIPS20\" data-bi-cN=\"NeurIPS 2020: Moving toward real-world reinforcement learning via batch RL, strategic exploration, and representation learning\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>NeurIPS 2020: Moving toward real-world reinforcement learning via batch RL, strategic exploration, and representation learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\t\t<\/div>\n\t\t<div class=\"moment__dot moment__dot--end\" role=\"presentation\"><\/div>\n\t<\/li>\n\t\n\n\t<li class=\"wp-block-msr-block-moment moment has-date\" data-bi-aN=\"block-moment\">\n\t\t<div class=\"moment__dot moment__dot--start\" role=\"presentation\"><\/div>\n\t\t<div role=\"presentation\"><\/div>\n\t\t<div class=\"moment__details\">\n\t\t\t\t\t\t<div class=\"moment__counter\"><\/div>\n\t\t\t\t\t\t\t<div class=\"moment__date-year\">\n\t\t\t\t\t2021\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t<div class=\"moment__content\">\n\t\t\t\n\n<h3 id=\"\" class=\"moment__title\"><\/h3>\n\n\n\n<p>Microsoft will host its third <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day-2021\/#!opportunities\">Reinforcement Learning Day<\/a> event in January 2021. This virtual workshop will feature talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective is to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.<\/p>\n\n\t\t<\/div>\n\t\t<div class=\"moment__dot moment__dot--end\" role=\"presentation\"><\/div>\n\t<\/li>\n\t\n\t\t<\/ol>\n\t<\/div>\n\t","protected":false},"excerpt":{"rendered":"<p>Reinforcement learning is about agents taking information from the world and learning a policy for interacting with it, so that they perform better. So, you can imagine a future where, every time you type on the keyboard, the keyboard learns to understand you better. Or every time you interact with some website, it understands better [&hellip;]<\/p>\n","protected":false},"author":38004,"featured_media":624957,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[244017],"tags":[186547],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-709156","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-collection","tag-reinforcement-learning","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199571],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[863034],"related-projects":[669597,568491,577638,442191,359810,235753],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"Image from Game of Drones simulation\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/12\/Research_20191202_NeurIPS_GameOfDrones_Site_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"December 7, 2020","formattedExcerpt":"Reinforcement learning is about agents taking information from the world and learning a policy for interacting with it, so that they perform better. So, you can imagine a future where, every time you type on the keyboard, the keyboard learns to understand you better. Or&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/709156","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38004"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=709156"}],"version-history":[{"count":43,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/709156\/revisions"}],"predecessor-version":[{"id":756799,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/709156\/revisions\/756799"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/624957"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=709156"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=709156"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=709156"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=709156"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=709156"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=709156"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=709156"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=709156"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=709156"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=709156"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=709156"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}