Real World Reinforcement Learning

Real World Reinforcement Learning

Established: May 3, 2019



Microsoft Research blog


Reinforcement Learning team photoReal World Reinforcement Learning (Real-World RL) projects enable the next generation of machine learning using interactive reinforcement-based approaches to solve real-world problems. The heart of the Real-World RL projects and applications is a platform striving to enable people and organizations to continuously learn and adapt.


Research Team

Engineering Team


Blue MSN logo on a transparent background

Goal: To provide MSN personalization for their news articles.

Approach: The Real-World RL platform was deployed inside of MSN’s infrastructure to enable a very rapid personalization rate across their world-wide deployments.

Results: The RL-based based personalization at MSN provides, on average, a 26% Click Through Rate (CTR) improvement. logo on black background

Goal: To provide, an external client, a means to personalize various areas of their website.

Approach: used the web interface (rest-API) of the Real-World RL platform to personalize their Top News articles, videos, and suggested articles.

Results: The Real-World RL platform ran for over two years and provided, on average, a 30% CTR improvement over their baseline (editor’s suggested rank).

Xbox 2016 stacked RGB logo on transparent background

Goal: To provide the Microsoft marketing team “Top Of Home” personalized ad campaigns.

Approach: Microsoft’s internal marketing campaign manager (IRIS) used the Real-World RL platform to personalize two of the three Xbox “Top of Home” slots.

Results: The pilot had two phases: 1) The Microsoft Research RL team ran counterfactual evaluation to estimate user’s engagement based on real-world data collected for two weeks in June 2018. 2) The Real-World RL system was deployed in production for two weeks in November 2018, resulting in a 60% CTR improvement over a baseline random policy and increased user’s engagement metrics.

Microsoft Surface stacked logo with symbol

Goal: To provide the marketing team (MLEDCOP) the ability to perform website layout personalization. The pilot specifically targeted the page layout for Japan.

Approach: The Real-World RL platform was used to personalize different calls-to-action in three different webpages on the Japan website. The pilot was run in an A/B fashion, where the control used the original layout as provided by Design, and the treatment used the Real-World RL platform to personalize the layout based on the users accessing the website.

Results: The RL-based based personalization provided an 80% CTR improvement over the control.

Skype logo

Goal: To provide Skype a means to optimize the length of their jitter buffer on a per-call basis in order to provide the best call quality possible to their end users.

Approach: The Skype team ran the Real-World RL platform on a subset of their call agents for a few weeks.

Results: When comparing results on their “treatment” traffic, the Skype team saw a 1.5% improvement on the Poor Call Quality Metric, a metric that is typically used to proxy how users felt about the quality of the call.

Black and white map graphic with green location arrows

Goal: To provide the AFD Frontier team a means to optimize the tcp/ip setting of their clusters to provide the best server configuration.

Approach: The AFD Frontier team used the Real-World RL platform for a 3-month pilot as part of the 2017 AI School.

Results: The Real-World RL system provided considerable lift over default behavior. The AI School project won “Best project award,” and it is now used as the basis for an extended pilot between AFD and Microsoft Research.

In the news


Winner of 2019 ACM SIGAI Industry Award

The selection committee for the ACM SIGAI Industry Award for Excellence in Artificial Intelligence (AI) is pleased to announce that the Decision Service created by the Real World Reinforcement Learning Team from Microsoft, has been chosen as the winner of the inaugural 2019 award. The committee was impressed with….

ACM SIGAI | June 12, 2019

Product Integration

3 Transparent Beaker illustrations on a blue background background

Custom Decision Service, a Microsoft Research project, uses reinforcement learning for a cloud-based, contextual decision-making API that sharpens with experience in order to provide personalized content. The research pilot was successful and released as Azure Cognitive Services Personalizer Preview, enabling enterprises and application developers to create rich, personalized experiences for every user.