Malmo, Minecraft and machine learning with Dr. Katja Hofmann

Published August 29, 2018

Share this page

Dr. Katja Hofmann, Researcher at Microsoft Research

Episode 39, August 29, 2018

The wildly popular video game, Minecraft, might appear to be an unlikely candidate for machine learning research, but to Dr. Katja Hofmann, the research lead of Project Malmo in the Machine Intelligence and Perception Group at Microsoft Research in Cambridge, England, it’s the perfect environment for teaching AI agents, via reinforcement learning, to act intelligently – and cooperatively – in the open world.

Today, Dr. Hofmann talks about her vision of a future where machines learn to collaborate with people and empower them to help solve complex, real-world problems. She also shares the story of how her early years in East Germany, behind the Iron Curtain, shaped her both personally and professionally, and ultimately facilitated a creative, exploratory mindset about computing that informs her work to this day.

Microsoft Research Podcast: View more podcasts on Microsoft.com
iTunes (opens in new tab): Subscribe and listen to new podcasts each week on iTunes
Email (opens in new tab): Subscribe and listen by email
Android (opens in new tab): Subscribe and listen on Android
Spotify (opens in new tab): Listen on Spotify
RSS feed (opens in new tab)
Microsoft Research Newsletter (opens in new tab): Sign up to receive the latest news from Microsoft Research

Episode Transcript

Katja Hofmann: What we really designed Malmo for was for this broad exchange between industry and the academic community. AI and reinforcement learning are fascinating techniques and this whole area is developing very, very quickly. But it’s not quite clear where the next new insight is going to come from. So, we were really envisioning this as a meta-platform that others could be using to start to compare, start to integrate the different approaches, and really generate new insights and understanding how to push this area forward.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Katja Hofmann, welcome to the podcast.

Katja Hofmann: Thanks for having me.

Host: You’re a researcher in the Machine Intelligence group in Cambridge. Give us a brief description of the work you do and the things you’re working on. In broad strokes what gets you up in the morning?

Katja Hofmann: I broadly work in the area of multi-agent reinforcement learning. So, I look at how artificial agents can learn to interact with complex environments. And I’m particularly excited about possibilities of those environments being ones where they interact with humans. So, one area is, for example, in video games, where AI agents that learn to interact intelligently could really enrich video games and create new types of experiences. For example, learn directly from their interactions with players, remember what kinds of interactions they’ve had and be really more relatable and more responsive to what is actually going on in the game and how they’re interacting with the player.

Host: So, so let’s drill in there just a little bit. What you’re taking about is what I think you call collaborative AI. Tell us a little bit more about what this is and why it’s an important line of inquiry in the broader field of AI research.

Katja Hofmann: I think of collaborative AI as really one of the key questions in artificial intelligence. So, when we think about machines that learn, or machines that perform certain tasks, the end goal of this is always, in my mind, a machine that is better at helping us achieve what we want to achieve. So, I really think about AI research as coming up with new ways to enable this collaboration between machines and humans in a huge variety of different applications. I think that general techniques like machine learning, reinforcement learning are particularly promising for pushing that forward and enabling new kinds of collaborations. But what I envision in the future are machines that understand what we’re trying to do and that can really reason about what it is that is most helpful to us to support us in achieving more.

Host: Yeah, so this kind of resonates with a lot of research that’s going on here at Microsoft Research, specifically this idea of “augment versus replace.” Talk to me a little bit more about the augmenting… where this application is most fruitful for us do you think?

Katja Hofmann: The application area that I think about the most is video games. And I see video games as an important stepping-stone towards enabling more general applications of collaborative AI. I think, in the long-term, there are a huge number of different applications starting from health to being creative… But I think for many of those applications, some of the key research questions we’re still trying to address are very, very hard to address in these open-ended, real-world environments. And video games form this really interesting intermediate stage, where we have very complex worlds that are extremely rich, that are really engaging for people. And we have a lot of scenarios where people interact in small or large communities within those fantastic worlds. So, there’s this space for introducing new technology, for understanding how agents within video games could learn to collaborate with people and then we can, once well-understood, take this technology and apply it to new application areas.

Host: I want to talk about this exciting application of collaborative AI that you’ve really poured a lot of time and effort into called Project Malmo. Tell us about it.

Katja Hofmann: Absolutely. So, Project Malmo is an AI experimentation platform that my team and I built on top of the popular video game Minecraft. And when we started developing this project we were really thinking about, what would the future platform for AI research look like? So what features, what capabilities would we need to support not just kind of addressing the next research questions that we were immediately focusing on, but that would enable a huge number of researchers, enthusiasts, to explore this space and push AI research forward for the next 10-15 years to come. So, we built a very generic platform on top of this Minecraft game to allow researchers to create different new tasks, to put different kinds of agents into the game and to really push the state-of-the-art forward.

Host: So why did you choose Minecraft to launch this platform?

Katja Hofmann: Minecraft seemed just the perfect fit for a project like this. If you’ve played Minecraft before, you’ll know that it’s kind of a sandbox game. It’s almost like a meta game where different communities, different players, go in and create amazing artifacts and new games within this game. There’s the concept of parkour races, where people set up kind of race courses to race against their friends. And there’s build battles where you have to construct creative new structures. So, people are using this sandbox game to come up with all these different ways of playing and interacting with each other. And if you think about a general-purpose platform for AI evaluation, then this is exactly what we need. We need to be able to have a platform that is general enough so that we can create some initial tasks that push the current state-of-the-art in reinforcement learning, or AI more generally, and then be able to expand that, build on that, to create more and more complex, more and more challenging tasks to throw at our agents to really push them moving forward. There are tasks in there around navigation that are at the level that can be addressed by current state-of-the-art approaches, all the way to a task that would require complex communication interaction in natural language. And we can support all those different scenarios within the platform.

Host: Let’s get into the weeds a little bit about the technology behind Malmo and what kinds of methodologies, approaches, techniques are you using to do this research?

Katja Hofmann: So, my team here is particularly interested in this area called reinforcement learning, where an agent starts with a clean slate or very little initial knowledge about the world. But it has to learn from interaction with its environments. So, for example, try a certain action and then learn about the consequences of that action in that world. But within the Malmo platform, not only work on reinforcement learning is supported, but within Malmo, we aim to support all types of artificial intelligence research. So, we provide opportunities for more symbolic reasoning approaches all the way to the reinforcement learning type of approaches that I mentioned earlier.

Host: So, we’ve talked about researchers using deep learning for exploratory AI. Why is Malmo a good platform for this?

Katja Hofmann: So, reinforcement learning is the general technique, and then deep reinforcement learning is a specific part of that where you learn from very high dimensional observation. So, for example, if you wanted to directly learn how to interpret visual signals that come from the environment, then you would use what’s called deep reinforcement learning to use that using neural networks. Then, exploration is a really key part within that. It’s one of the key challenges in reinforcement learning, to understand how an agent that is thrown into some arbitrary, complex world can collect new experiences, can collect data about this world in such a way that it learns to understand what kind of tasks, what kinds of goals, it could achieve within that world.

Host: Talk a little bit about the importance of simulation when we’re working with AI agents before they hit the open world.

Katja Hofmann: So, AI agents and, specifically, when we’re talking about reinforcement learning agents that learn from direct interaction with an environment, they essentially learn from trial and error. So, they need to try some action and some of those might fail. And they need to observe the negative consequences of those actions in order to form a good understanding about how the world works. Now, if we were to think about safety-critical applications like flight or self-driving cars or maybe the health space, then we want to make sure that if agents explore, they only try those actions that are actually sensible and have a good chance of success within those environments. Just like people do. We wouldn’t try arbitrary random things. We would try to address a problem by taking a path forward that has a good chance of giving us a good outcome. So, we would like those agents to be pre-trained as much as possible in a simulated environment where the agents can explore and learn from trial and error without having negative consequences for real-world environments. In a robotics case, this could be avoiding crashing a simulator, or crashing a robot. But once the system has learned from this initial simulated environment, then the key question is how to transfer that to the real environment and there’s actually a lot of work. Many other colleagues are focusing on that to look at how well learned behavior can be translated into real-world situations. And in many cases, it’s surprisingly effective to actually pre-train in simulation and then perform the task in a real-world environment.

Host: So, let’s talk a bit about this reinforcement learning. We’ve talked to other researchers about the rewards and… I don’t want to call them punishments, but how does an algorithm deal with rewards different from a human, right? We would feel embarrassed or ashamed or whatever that we made a mistake. An algorithm doesn’t have those feelings. What is the mechanism that you build into the machine learning techniques for reinforcement learning?

Katja Hofmann: That’s a fantastic question. And there are two ways in which I think about how these agents handle rewards. One side of this is kind of the reward structure that is imposed on the agent, in this case often by the experimenter or by the person designing a system. So, if you play a game, let’s say a parkour race in Minecraft, then the experimenter could look at this and say, well if you win the race then now you get a positive reward, so we want to encourage that behavior, we give a plus one. If you lose, then you get a minus one. So, this is very much kind of a hand-tuned reward structure that would be application dependent. In some situations, there might be a reward structure that is very natural. So, for example when you play Atari games the score was a pretty good proxy rewards structure. Or if you learned to play chess then winning or losing the game is a good one. But there are many application areas where there’s not an obvious reward structure. If you wanted to train an agent to help a human user perform whatever the user is trying to do, then you would need to think more generally about what is a good reward structure for such an agent that learns to cooperate with people? And this is actually one of the key directions that we’re focusing on within my group, to understand how to create reward structures that would be useful for learning this kind of cooperative behavior or the supportive behavior in agents.

(music plays)

Host: Your team has intentionally made the Malmo platform open source. And independent of, or agnostic to, the variety of methodologies and programming languages that researchers or developers might bring to the table. Why have you worked so hard to make this project so open?

Katja Hofmann: (laughs) That’s a fantastic question and if you ask my team it’s quite painful to develop for three different operating systems and at least five different programming languages. But what we really designed Malmo for was for this broad exchange between industry and the academic community. AI and reinforcement learning are fascinating techniques and this whole area is developing very, very quickly. But it’s not quite clear where the next new insight is going to come from. There are thousands of people, maybe you know hundreds of thousands of people, working on tackling some of those really hard challenges and it is often hard to compare very different approaches with each other, because different communities might have different tools, might be using different programming languages, might be using different benchmarks. So, we were really envisioning this as a meta-platform that others could be using to start to compare, start to integrate the different approaches, and really generate new insights and understanding how to push this area forward.

Host: So, you alluded, just now, to collaboration with academia. Industry and academia collaborating. What does each party bring to the party, so to speak, in this back-and-forth between applied and pure research?

Katja Hofmann: I think that’s a great question. And it’s one where I kind of think of myself as a little bit in the middle, as kind of having a foot in both. Because at Microsoft Research, we are really in the research area. So, some of our work very much looks like what would happen at an academic institution. But at the same time, we have access to this huge company where there’s a lot of interesting problems and product groups and you know, looking at people who are solving some of the really hard challenges that are experienced in industry. And that gives us a great source of both collaboration and inspiration of what’s coming. Maybe what key challenges need to be addressed and how we can frame our research or maybe think about our research in such a way that it can achieve maximal input. So really thinking about well, if I focus on this area and answering those questions, how is this going to change the world? How is this going to change how we look at things? And what kind of real-world impact could that have? And I think this real-world perspective is something really valuable that the industry side brings to the table. On the academic side, you have the opportunity to take a longer-term view. I mean, some of the breakthrough technologies that we’re using today – deep learning – has roots that go several decades back and have taken a huge amount dedication and energy of really a sustained research program on the academic side. And I think this long-term view, as well as the huge variety of different opinions and ideas that we see in the academic community, are extremely valuable and absolutely necessary for pushing the field forward. By bringing those two sides together, I think really interesting things can happen.

Host: When you launched Project Malmo a year ago you had a really overwhelming response. And since then there’ve been some exciting new developments with the project. Tell us what happened at the outset, what’s happened since and what you’re seeing on the horizon in the future.

Katja Hofmann: So, we were really excited, more than a year ago, in launching the platform and really seeing how the community would respond and what they would do with it. And it was really crazy and exciting to see how many people would pick up this platform and use it for a huge variety of different purposes. Some of those we have never thought about or never anticipated. There was quite some uptake in class projects to learn about concepts in AI and different approaches to AI. There are different enthusiasts that would just love interacting with the platform and seeing what they can come up with. And there’s a huge variety in terms of the research directions that people are establishing on top of it. At the same time, people asked about using specific benchmarks within the platform and this is what motivated us to really start looking at what benchmarks we would like to create in order to facilitate research in some of the key research areas that we think of as the most challenging at this point in time. So out of that discussion initially came the first Malmo collaborative AI challenge which we ran last year. Maybe some of the listeners remember the pig chase task that we ran there. Which was a fantastic experience and we had very motivated participants from all over the world. But also gave us lot of new insights, new learnings about what went well, what could be improved, and how to move into a next round of creating a challenge that could have more targeted impact on the research community. So since then, we started reaching out. We’ve built up a network of academic collaborators which we’re very proud to work with. And those are teams at Queen Mary University in London, as well as EPFL. The university in Lausanne. And we put our heads together and looked at specifically the question of generality. So, how can we create a benchmark that would push the research towards learning approaches that would learn not just to perform well on a single task with maybe a single opponent in a multi-agent scenario, but that would really be pushing those approaches towards multi-task, multi-agent learning in this video game setting that we’re providing here.

Host: That’s actually quite exciting and challenging, right?

Katja Hofmann: It’s very challenging, yes.

Host: Do you have another challenge coming up or have you already launched another challenge?

Katja Hofmann: Absolutely. We’re just about to launch the Marlo Competition which is the Multi-Agent Reinforcement Learning Competition in Malmo. I believe by the time this project comes out it will already have been launched but we’re just preparing the platform for actually releasing the competition.

Host: How did you come up with the name Malmo?

Katja Hofmann: So, this was geographically inspired. As you may know, home of Minecraft is Stockholm where the game was originally developed and initiated. And my team is here in Cambridge in the UK. And Malmo is almost in the middle between those. It’s not a precise fit but it was the closest kind of large city that we found. And I hear that Malmo is a very energetic, young city with a lot of exciting things happening. So, it seems like a very good fit.

Host: Oh, that’s great. And just sort of a side question, how would you define who your audience is for the Malmo challenges?

Katja Hofmann: So, for the competitions we are targeting students in particular. So, we think students who have maybe had some experience with reinforcement learning or machine learning and are trying to test out their skills. This would be a fantastic competition to try out what they’ve learned and maybe push things forward. We think that the benchmark is a serious one for the academic community. So, we’d love to see exciting new research in multi-agent, multitask learning to be inspired by this.

Host: So, certain techniques in AI research like current exploration in probabilistic modeling are tackling the big problems of ambiguity, complexity and uncertainty, and a lot of interesting work in this area is coming out of the Cambridge lab. How are you seeing these probabilistic models affect the work that you’re doing?

Katja Hofmann: I think this is a fantastic area where two research areas that used to be quite distinct from each other are starting to move more closely together. If you think about reinforcement learning as learning from trial and error, then you need to think very deeply about uncertainty. So, if you already know what effects your actions will have in some environment then there’s no need to explore. You already know everything you need to know and you can just compute or develop an optimal strategy for acting in this environment. But a lot of times the experience of the agent will be very limited and then it’s crucial to have a good estimate, or be able to quantify that uncertainty. And whether uncertainty is due to not having seen enough data in a particular part of the environment, or whether there’s true stochasticity so, for example, the difference between playing a slot machine and exploring a dark cave. In one example, you see there’s a lot of uncertainty about the outcome but it’s because the slot the machine is really random. In the other one, there’s a lot of uncertainty just because you may be at the entrance of the cave and you just haven’t explored the space yet. And a lot of recent work in probabilistic modeling, especially around stochastic neural networks, is taking very rapid steps forward towards capturing those different kinds of uncertainty. And there’s a very clear application area in reinforcement learning. Some key questions on how to effectively use those models to inform exploration and acting in uncertain environments.

Host: So, talk a bit more—I know you have a working paper on stochastic neural nets and generative models. Talk a little bit more about what you’re exploring here. You personally.

Katja Hofmann: One working paper we have on the archive right now is looking at variational models for model-based reinforcement learning. So, the question we asked there is whether starting with some initial set of data, it would be more effective and efficient to learn a model of that environment and then to compute or derive a structure of how to act, or whether what’s called model-free approaches, that’s directly learn a mapping to a policy, so directly map to an agent’s behavior, would be more effective. And there we found, first of all, that these types of models can very effectively learn the structure of the environment. But also, that we’re particularly able to use those models in situations where the environment might be changing. So, one scenario is if you get to initially explore the environment but you may not have any information about the task or the reward structure, then we’re able to show that we can leverage these models to learn effectively even before we see the rewards structure, and that the agent can then effectively combine this with new information and perform the task very well even with a limited amount of data.

(music plays)

Host: We’re told over and over that AI has great potential to benefit the world. But there’s a lot of speculation about what it will actually look like. Especially if we enable agents, on a large scale, to learn and make decisions. Maybe even independent of humans. And you’re doing work that enables AI agents to learn and make decisions. So, is there anything about that that keeps you up at night, and if so what are you doing about it?

Katja Hofmann: With all new technology that is initially poorly understood, there are a lot of open questions and a lot of uncertainty. As you note, especially with reinforcement learning technology, we still don’t have a lot of good examples of actually seeing this kind of technology deployed in real-world applications. And there’s key questions around how we can make sure that all those decisions that the agent learns to make are warranted and that we can explain and understand why they made those decisions. If those are agents that learn to make decisions in a way that allows them to interact quite directly with humans or much more flexibly than we’re currently used to, how can we make sure that those decisions and those interactions are actually intelligible to the users and that we have some way of understanding when an agent is learning something new and how our interaction with it affects what it’s learning? Those are really hard questions, and I think this is one reason why video games are such a fantastic platform for studying those kinds of technologies. They provide the sandbox where we can safely explore what it means to interact with something that is learning to interact with you. And how to frame that whole learning process and that interaction and make sure that we can understand what’s going on there.

Host: You have an incredibly interesting personal life story that includes first-hand experience of what I would say is one of the biggest events of the 20th century. Tell us a bit about your life, from your beginnings, and what got you interested in computer science and how did you come to be doing research and Microsoft Research in England?

Katja Hofmann: So, I was born in East Germany. So, I spent the first years of my life actually behind the Iron Curtain. And I am old enough to remember some of this. And it’s—I think it quite shaped who I am as a person. Initially there was this conviction that there’s all these countries that I would never travel to. And my mom is a geography teacher. So, she used to tell us about all these countries in the world. And it was just completely infeasible for us to even think about ever visiting them. And then, once the wall fell, it opened up all those opportunities and I’ve traveled to all continents except for Antarctica. I’ve studied in three different countries. Worked in four. So, you know, it’s just created all this life story that you know, when I was a little girl, I could’ve never imagined anything like this. So, it’s quite interesting to reflect on this. I think it also impacted how I came to encounter computers. In the GDR, there weren’t a lot of computers. So personal computers only became available after the wall fell. And that means that when I was growing up, when I was a teenager and my parents bought a computer because they heard that that could be something useful, there were no preconceptions about what you can and cannot do using a computer. There weren’t any role models that would’ve influenced me. And it was this box that stood in my room and that I could just use completely freely and figure out what it can do. I taught myself how to program and started exploring what’s possible there. And I think this kind of freedom and sense of creativity and being able to explore really shaped my perspective of computing and really influenced this decision to pursue computer science later on.

Host: So, from that room, and your three countries of study, how did you end up at Microsoft Research?

Katja Hofmann: It almost feels like that was by accident. I, again, wouldn’t have anticipated this. But during my PhD, I was working in an area called information retrieval. So, I was looking at how to make search engines more intelligent, and one of the threads that was already present was to understand how search engines could learn from their users to be better able to surface exactly what the user is looking for. So, it had this theme of interactive learning combined with search applications. One of the people that I really admired was working here at Microsoft Research in Cambridge and I introduced myself at a conference and we chatted about this work. That lead to an internship at Microsoft Research and Bing in Redmond at the time, and gave me this opportunity to explore what possibilities there were and what it meant to be working at Microsoft Research. That was a fantastic experience. So, when I was asked whether I wanted to apply for a post-doc position here, I was thrilled and that was the top choice that I had.

Host: Well, as we close Katja, I like to ask researchers what advice they would give to other aspiring researchers. Especially those who might be interested in the kind of research you’re doing—machine intelligence and perception. And I’m especially interested in what you’d say to young women who might be considering following in your footsteps.

Katja Hofmann: That’s very, very big question. Ummm…

Host: Give us a big answer.

Katja Hofmann: What I’d like to see more in researchers is thinking more about why we do what we do. Academic and intellectual curiosity is one thing, but the research that we do in AI and machine learning has such huge potential to change the world. Anyone, be that an expert in the field or someone on the street, will tell you that AI will have a fundamental impact on people’s lives. So, I’d like to encourage people to think more about what matters about their research. Why are they doing this research and how we should be using it to influence and make sure that the world we create is the one we actually want to live in in the future.

Host: Just drilling in a little bit, you don’t see a lot of women, proportionally, in machine learning research. What do you think was instrumental in getting you interested in going that field?

Katja Hofmann: I think what played a big part was that I had this opportunity to just completely freely discover this area and figure out what it meant for me before being confronted with maybe preconceptions and ideas that people had about what computing was and what it was for and who it was for. And I see a lot of the imbalance that we’re seeing today very much reflecting these preconceptions that people have. I’d love to play a part in changing those preconceptions. I think they are incredibly hurtful and prevent very bright, very talented people who would work really hard to make an impact on this field and sometimes prevent them from entering the field. It’s hard to derive specific advice from that. You can say try to isolate yourself from what everyone thinks and that’s not an easy thing to do.

Host: No, but you know what? Right now, I think you are an inspiration on what can be done, and what you can accomplish so…

Katja Hofmann: Certainly, hope so.

Host: Katja Hofmann, thank you so much for taking time at the end of your day and the beginning of mine to come on the podcast today.

Katja Hofmann: Thank you so much for having me. It was a great pleasure.

To learn more about Dr. Katja Hofmann, and how Project Malmo is pushing the state-of-the-art in reinforcement learning, visit Microsoft.com/research.

Microsoft Research Podcast

Malmo, Minecraft and machine learning with Dr. Katja Hofmann

Episode 39, August 29, 2018

Related:

Episode Transcript

Research Areas

Research Groups

Related projects

Related events

Related labs

Related videos