Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Project Malmo competition returns with student organizers and a new mission: To democratize reinforcement learning

August 9, 2019 | By Noboru Sean Kuno, Senior Research Program Manager

When I was asked about my favorite movie in a game with friends after my wedding ceremony, I replied Star Wars. That was about two decades ago, and, yes, it’s still the case. I especially like Return of the Jedi. The third installment in the original trilogy is almost perfect to me. Luke Skywalker returns to fight back against the Empire as a member of the Rebel Alliance with the help of his old friend Han Solo and new friends the Ewoks. It’s must-see as far as I’m concerned. Third stories have proven to be special in other franchise masterpieces, too, such as The Lord of the Rings, Back to the Future, and Indiana Jones.

The MineRL competition is the third in a trilogy of a different sort—contests based on Project Malmo, an AI experimentation platform built on top of Minecraft—and it’s distinguishing itself from other contests and its Malmo predecessors in really exciting ways.

MineRL is the first of its kind to put a premium on agent training efficiency, and we believe it’s the first competition to explicitly take advantage of an approach that combines reinforcement learning and imitation learning with a large dataset. And while The Malmo Collaborative AI Challenge in 2017 was organized by Microsoft and The Multi-Agent Reinforcement Learning In Malmo (MARLO) Competition in 2018 was co-organized by Microsoft, Queen Mary University of London, and CrowdAI, now AIcrowd, this year’s competition was proposed by and is based on the work of students from Carnegie Mellon University.

The power of competition

CMU PhD student William Guss, the competition’s lead organizer, has long been interested in doing machine learning in Minecraft, drawn to the game by the ability of its open-world environment to reflect the nature of real-world tasks and challenges. It’s why researchers here at Microsoft Research like it, too. William was intrigued by Project Malmo, but saw there were limitations in current reinforcement learning tools and methods that were making it difficult to fully take advantage of the unique training ground provided by the game and platform. State-of-the-art reinforcement learning systems require rapidly increasing amounts of samples and computing resources, making it hard to replicate and improve those systems let alone apply them in the real world. Additionally, the reward functions reinforcement learning employs aren’t conducive to specifying the kind of general intelligence researchers hope their agents can eventually achieve.

In response, William, Brandon Houghton, and several other CMU students developed technology to record the completion of various tasks in Minecraft, creating a large-scale dataset of human demonstrations called MineRL-v0. They realized, though, the dataset wouldn’t be nearly as valuable without more efficient algorithms to use it. Having seen the success of machine learning competitions such as the ImageNet challenge in galvanizing research in a particular direction, they began considering a competition designed around sample-efficient and imitation-based reinforcement learning using their dataset. With this in the back of their minds and ready to release the dataset, they reached out to Microsoft about collaborating in general. Both parties came to the realization that partnering for a competition was a natural fit, and the MineRL competition was born.

Making AI more inclusive

In the competition—which is in partnership with Queen Mary University of London, AIcrowd, Preferred Networks, and Microsoft—participants have to develop a system to obtain a diamond in Minecraft using only four days of training time and no more than 10 million samples. To put the challenge into perspective, it’s taken between 44 million and more than 200 million samples to train deep reinforcement learning models to play ATARI 2600 games as well as a person. These imposed training limitations are important to encouraging efficiency, which the CMU team envisions serving the larger goal behind the competition’s design: the democratization of reinforcement learning.

Reinforcement learning is so data-dependent that only those with access to such resources are able to work in and make contributions to the space, limiting the scope and pace of advancement. Inclusivity is so integral to what the competition is trying to accomplish that its infrastructure includes computational and travel grants, provided by Microsoft, to support those underrepresented in the research community in participating in the competition and traveling to the 2019 Conference on Neural Information Processing Systems (NeurIPS). MineRL is part of the NeurIPS competition track, and the CMU team will host a workshop showcasing methods from the competition at the conference. As William put it, “The concentration of computational power and resources to those currently within the field and already with the means to research reinforcement learning, in some sense, impacts those underrepresented communities the most.” With MineRL, the CMU team hopes to lower the barriers of entry by changing the current state of reinforcement learning by making it more sample efficient.

Get in on the competition

The first round of the competition is open on the AIcrowd platform, and submissions are being accepted until late September/early October. The CMU team’s MineRL Python package, which includes a Malmo extension and tools for downloading the MineRL-v0 dataset, has already been downloaded more than 10,000 times, and more than 700 teams have signed up for the competition, the most sign-ups for a NeurIPS competition. “Seeing the work that we’ve put into this competition having a tangible effect on the research community has been the most fulfilling aspect of organizing,” William told us.

If you want to learn more about MineRL-v0—which is more than 60 million samples strong—check out the paper “MineRL: A Large-Scale Dataset of Minecraft Demonstrations.” The CMU team will be presenting the paper at the 2019 International Joint Conference on Artificial Intelligence Aug. 10–16 in Macao, China. To contribute to the dataset, visit the MineRL server that has been set up for data collection.

“Our collaboration with the team led by CMU has been fantastic,” said Katja Hofmann, Research Lead of Project Malmo and Principal Research Manager of Microsoft Research Cambridge. “I am very happy to see such an exciting competition being organized on Project Malmo, which we have developed and made open source to the research community. This competition is a great example of how the platform enables a very wide range of research.”

Return of the Jedi is not just the third story of the original trilogy; it opened up the prequel trilogy and the sequel trilogy. We’re looking forward to seeing another story of ambitious students who take advantage of the Malmo platform to pursue their research agenda.

The MineRL competition organizing team

William H. Guss, Carnegie Mellon University
Mario Ynocente Castro, Preferred Networks
Cayden Codel, Carnegie Mellon University
Katja Hofmann, Microsoft Research
Brandon Houghton, Carnegie Mellon University
Noboru Kuno, Microsoft Research
Crissman Loomis, Preferred Networks
Keisuke Nakata, Preferred Networks
Stephanie Milani, University of Maryland, Baltimore County and Carnegie Mellon University
Sharada Mohanty, AIcrowd
Diego Perez Liebana, Queen Mary University of London
Ruslan Salakhutdinov, Carnegie Mellon University
Shinya Shiroshita, Preferred Networks
Nicholay Topin, Carnegie Mellon University
Avinash Ummadisingu, Preferred Networks
Manuela Veloso, Carnegie Mellon University
Phillip Wang, Carnegie Mellon University

Up Next

Artificial intelligence

Provably efficient reinforcement learning with rich observations

Reinforcement learning, a machine learning paradigm for sequential decision making, has stormed into the limelight, receiving tremendous attention from both researchers and practitioners. When combined with deep learning, reinforcement learning (RL) has produced impressive empirical results, but the successes to date are limited to simulation scenarios in which data is cheap, primarily because modern “deep […]

Akshay Krishnamurthy

Researcher

Artificial intelligence

Winners announced in multi-agent reinforcement learning challenge

In Learning to Play: The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition, we invited programmers into this digital world to help tackle multi-agent reinforcement learning. This challenge, the second competition using the Project Malmo platform, tasked participants with designing learning agents capable of collaborating with or competing against other agents to complete tasks across three different games within Minecraft.

Noboru Sean Kuno

Senior Research Program Manager

Artificial intelligence

Project Malmo: Reinforcement learning in a complex world

France’s victory over Croatia in the 2018 FIFA World Cup was as thrilling as sports competition gets. If you’re as much a fan of the game as I am, you enjoyed watching 32 national teams vie for the title over a beautiful month across 11 cities in Russia. The riveting action taking place on the […]

Noboru Sean Kuno

Senior Research Program Manager