{"id":631245,"date":"2020-01-16T11:12:18","date_gmt":"2020-01-16T19:12:18","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=631245"},"modified":"2020-02-10T07:07:00","modified_gmt":"2020-02-10T15:07:00","slug":"by-making-text-based-games-more-accessible-to-rl-agents-jericho-framework-opens-up-exciting-natural-language-challenges","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/by-making-text-based-games-more-accessible-to-rl-agents-jericho-framework-opens-up-exciting-natural-language-challenges\/","title":{"rendered":"By making text-based games more accessible to RL agents, Jericho framework opens up exciting natural language challenges"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-631260 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/MSResearch_20200114_ProjectJericho_1400x788.gif\" alt=\"animated diagram \" width=\"1400\" height=\"786\" \/><\/p>\n<p>You\u2019re in a field. In front of you, there\u2019s a white house. The door is boarded shut. The immediate challenge\u2014investigate the house. The game\u2014<em>Zork I: The Great Underground Empire,<\/em> a treasure-seeking adventure in which you\u2019ll encounter monsters, a thief, and other interesting characters along the way.<\/p>\n<p>As a player of this text-based game, you string together simple commands of only several words, like \u201cwalk to the house.\u201d Once there, you type a series of commands, not all of them fruitful, to circle the house until you find a way in. There is a window ajar. You \u201copen the window\u201d and \u201center the house,\u201d the adventure truly beginning. To continue between rooms within the house and locations beyond and to interact with the objects you find, you rely on your ability to recall prior information like realizing the lantern you need to safely explore underground is in a previous location and commonsense knowledge to know what to do when you find it.<\/p>\n<p>Interactive fiction (IF) games such as <em>Zork<\/em> provide great environments for reinforcement learning agents to hone their natural language understanding and generation. However, without the commonsense knowledge and effective recall human players possess, they\u2019re faced with a task far more monumental than ours\u2014choosing from <em>billions<\/em> of possible actions. For example, an agent generating a modest four-word command from a provided vocabulary of 700 words is effectively navigating a space of size 700<sup>4<\/sup>, or 240 billion possible actions.<\/p>\n<p>To address this challenge and help researchers take advantage of this valuable testing ground, we introduce <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/jericho\">Jericho, an open-source environment for agents to interface with IF games<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Jericho supports template-based action generation in which an agent first selects the template of an action it wishes to execute and then selects words from the game\u2019s vocabulary to fill in the blanks of the template, significantly reducing the action space and making the explore problem significantly more tractable. We\u2019re presenting the paper, \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/interactive-fiction-games-a-colossal-adventure\/\">Interactive Fiction Games: A Colossal Adventure,<\/a>\u201d at the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aaai.org\/Conferences\/AAAI-20\/\">34th AAAI Conference on Artificial Intelligence<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<div id=\"attachment_631281\" style=\"width: 871px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-631281\" class=\"wp-image-631281 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Image.jpg\" alt=\"text from interactive fiction game \" width=\"861\" height=\"450\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Image.jpg 861w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Image-300x157.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Image-768x401.jpg 768w\" sizes=\"auto, (max-width: 861px) 100vw, 861px\" \/><p id=\"caption-attachment-631281\" class=\"wp-caption-text\">Above is an excerpt from the interactive fiction (IF) game 9:05. IF games are text-based, providing a great opportunity\u00a0 for reinforcement learning agents to hone such skills as natural language understanding and generation, sequential decision-making, and reasoning. However, given large action spaces and other challenges, they have generally been difficult environments for RL agents to learn in. Jericho is a framework designed to make them more accessible.<\/p><\/div>\n<h3>Making IF games more approachable<\/h3>\n<p>Generating coherent language-based commands is a challenge for existing RL agents, as the space of possible commands grows <em>combinatorially<\/em> as shown by the example above. Existing agents commonly operate on action spaces with only tens or hundreds of possible actions and are largely unable to tractably explore action spaces presented by IF games without prohibitively long training times. Jericho helps address this challenge by revealing the list of action templates and vocabulary words that are recognized by the game, normally hidden from players in a non-human-readable format, for agents to choose from. The template example \u201ctake ___ from ___\u201d could result in a successful action when combined with the vocabulary words <em>lantern<\/em> and <em>case<\/em>, for instance.<\/p>\n<p>Providing templates and vocabulary can reduce the size of the action space by several orders of magnitude\u2014from 240 billion possible actions to 98 million. Without game-specific vocabulary and templates, researchers might be inclined to provide their agents with an English dictionary from which to form commands\u2014hence the exponentially large action space\u2014or create a smaller vocabulary list, running the risk of unintentionally leaving out a game-specific key term, like <em>lantern<\/em> in <em>Zork I<\/em>. With Jericho\u2019s vocabulary, you\u2019re guaranteed to not miss crucial words.<\/p>\n<p>In addition to template-based action generation, Jericho provides other features to make IF games more accessible to existing agents, including the following:<\/p>\n<ul>\n<li><strong>World-object-tree representation<\/strong>: Because of the large number of locations, objects, and characters in many games and the possibility of puzzles requiring objects not present in the current location, agents need to develop ways to remember and reason about previous interactions. World-object-tree representations of the game state enumerate these elements.<\/li>\n<li><strong>Fixed random seed to enforce determinism<\/strong>: By making games deterministic, where subsequent states are a direct result of a specific action taken by an agent, Jericho enables the use of targeted exploration algorithms like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1901.10995\">Go-Explore<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which systematically build and expand a library of the visited states.<\/li>\n<li><strong>Load\/save functionality:<\/strong> This feature enables restoration of previous game states, enabling the use of planning algorithms like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/hal.inria.fr\/inria-00116992\/document\">Monte-Carlo tree search<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/li>\n<li><strong>World-change detection and valid-action identification:<\/strong> This feature provides feedback on the success or failure of an agent\u2019s last action to effect a change in the game state. Furthermore, Jericho can perform a search to identify <em>valid actions<\/em>, those that lead to changes in the game state.<\/li>\n<\/ul>\n<p>Researchers can control the difficulty of the learning problem by picking and choosing which of Jericho\u2019s features to employ.<\/p>\n<h3>Learning agents<\/h3>\n<p>We applied two learning agents to the games supported by the Jericho framework: Template-DQN (TDQN) and deep reinforcement relevance network (DRRN).<\/p>\n<div id=\"attachment_631284\" style=\"width: 911px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-631284\" class=\"wp-image-631284 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Diagram.jpg\" alt=\"side by side diagrams\" width=\"901\" height=\"584\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Diagram.jpg 901w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Diagram-300x194.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/Jericho-Diagram-768x498.jpg 768w\" sizes=\"auto, (max-width: 901px) 100vw, 901px\" \/><p id=\"caption-attachment-631284\" class=\"wp-caption-text\">Two learning agents\u2014TDQN (left) and DRRN (right)\u2014were applied to the games supported by the Jericho framework. Leveraging Jericho features, they outperformed two baseline agents, demonstrating the framework can help make IF games more accessible to agents for improving language-based skills.<\/p><\/div>\n<p>Both agents (above) employ a common input representation, generated after each command and consisting of the current textual observation <em><strong>o<sub>nar<\/sub><\/strong><\/em> , inventory text <em><strong>o<sub>inv<\/sub><\/strong><\/em> , and current location description\u00a0<em><strong>o<sub>desc <\/sub><\/strong><\/em>(as given by a <em>look<\/em> command). The following is an example common input representation generated in <em>Zork I<\/em> following the command \u201copen window\u201d:<\/p>\n<p style=\"padding-left: 40px;\"><em><strong>o<sub>nar<\/sub><\/strong>: With great effort, you open the window far enough to allow entry.<\/em><br \/>\n<em><strong>o<sub>inv<\/sub><\/strong>: You are empty-handed.<\/em><br \/>\n<em><strong>o<sub>desc<\/sub><\/strong>: You are behind the white house. <\/em><em>A path leads into the forest to the east. <\/em><em>In one corner of the house there is a small window which is slightly ajar.<\/em><\/p>\n<p>While both agents utilize common input representation, they differ in the methods of action selection. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/deep-reinforcement-learning-natural-language-action-space\/\">DRRN<\/a> uses Jericho\u2019s valid-action identification to estimate a Q-value for each of the valid actions <em>a<\/em>. It then either acts greedily by selecting the action with the highest Q-value or explores by sampling from the distribution of valid actions.<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/tdqn\">TDQN<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, based on the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.aclweb.org\/anthology\/D15-1001.pdf\">LSTM-DQN algorithm<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, generates separate Q-value outputs over the set of templates <em>Q(o,u)<\/em> and vocabulary words <em>Q(o,p1) Q(o,p2)<\/em>. Thus, it must contend with the full template-based action space (98 million possible actions in <em>Zork I)<\/em>. Jericho\u2019s valid actions are also used during training as a supervised loss to help steer the agent toward commands that will yield state changes, but are not required for running the policy after training.<\/p>\n<p>Both DRRN and TDQN use the load\/save feature of Jericho to create the common input representation.<\/p>\n<h3>The results<\/h3>\n<p>We evaluated TDQN and DRRN across a diverse set of 32 games, including <em>Zork I<\/em>, with the aim of creating a reproducible benchmark to help the community track progress and move the state of the art. We compared the two learning agents to two non-learning baseline agents: a random agent that picks randomly from a set of 12 common IF actions at each step, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/nail-a-general-interactive-fiction-agent\/\">NAIL, a competition-winning heuristic-based IF agent<\/a>. <span style=\"font-size: 12.0pt; color: black;\">Neither agent used any of Jericho\u2019s features.<\/span><\/p>\n<p>We observed the following average completion rates; a completion rate of 100 percent means finishing the game with maximum score: the random algorithm, 1.8 percent; NAIL, 4.9 percent; TDQN, 6.1 percent; and DRRN, 10.7 percent. TDQN and DRRN accumulate significantly higher scores than the other agents, even when dealing with action spaces as large as 98 million. The success of these learning agents demonstrates Jericho is effective at reducing the difficulty of IF games and making them more accessible for RL agents to learn and improve language-based skills.<\/p>\n<h3>An exciting opportunity<\/h3>\n<p>Much improvement is needed before these algorithms and others can rival skilled humans at these games. We believe it will be necessary to incorporate better priors that convey human-like understandings of commonsense reasoning and knowledge representation to get there.<\/p>\n<p>Without visuals grounding the language, IF games present an exciting opportunity to advance natural language understanding, natural language generation, and sequential decision-making in RL agents, which we see impacting such real-world applications as voice-activated personal assistants. To learn more about the work Microsoft Research is doing with IF games, check out <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/textworld\/\">TextWorld<\/a>, a controlled framework for creating computer-generated IF games of varying levels of difficulty.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You\u2019re in a field. In front of you, there\u2019s a white house. The door is boarded shut. The immediate challenge\u2014investigate the house. The game\u2014Zork I: The Great Underground Empire, a treasure-seeking adventure in which you\u2019ll encounter monsters, a thief, and other interesting characters along the way. As a player of this text-based game, you string [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":636015,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":0,"footnotes":""},"categories":[194467],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-631245","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[633696],"related-researchers":[{"type":"user_nicename","value":"Marc-Alexandre C\u00f4t\u00e9","user_id":37197,"display_name":"Marc-Alexandre C\u00f4t\u00e9","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/macote\/\" aria-label=\"Visit the profile page for Marc-Alexandre C\u00f4t\u00e9\">Marc-Alexandre C\u00f4t\u00e9<\/a>","is_active":false,"last_first":"C\u00f4t\u00e9, Marc-Alexandre","people_section":0,"alias":"macote"},{"type":"user_nicename","value":"Xingdi (Eric) Yuan","user_id":37167,"display_name":"Eric Yuan","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/eryua\/\" aria-label=\"Visit the profile page for Eric Yuan\">Eric Yuan<\/a>","is_active":false,"last_first":"Yuan, Eric","people_section":0,"alias":"eryua"},{"type":"guest","value":"prithviraj-ammanabrolu","user_id":"631548","display_name":"Prithviraj  Ammanabrolu","author_link":"<a href=\"http:\/\/prithvirajva.com\/\" aria-label=\"Visit the profile page for Prithviraj  Ammanabrolu\">Prithviraj  Ammanabrolu<\/a>","is_active":true,"last_first":"Ammanabrolu, Prithviraj ","people_section":0,"alias":"prithviraj-ammanabrolu"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/MSResearch_20200116_ProjectJericho_1200x628-960x540.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/MSResearch_20200116_ProjectJericho_1200x628-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/MSResearch_20200116_ProjectJericho_1200x628-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/MSResearch_20200116_ProjectJericho_1200x628-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/MSResearch_20200116_ProjectJericho_1200x628-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/01\/MSResearch_20200116_ProjectJericho_1200x628-640x360.png 640w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"Matthew Hausknecht, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/macote\/\" title=\"Go to researcher profile for Marc-Alexandre C\u00f4t\u00e9\" aria-label=\"Go to researcher profile for Marc-Alexandre C\u00f4t\u00e9\" data-bi-type=\"byline author\" data-bi-cN=\"Marc-Alexandre C\u00f4t\u00e9\">Marc-Alexandre C\u00f4t\u00e9<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/eryua\/\" title=\"Go to researcher profile for Eric Yuan\" aria-label=\"Go to researcher profile for Eric Yuan\" data-bi-type=\"byline author\" data-bi-cN=\"Eric Yuan\">Eric Yuan<\/a>, and <a href=\"http:\/\/prithvirajva.com\/\" title=\"Go to researcher profile for Prithviraj  Ammanabrolu\" aria-label=\"Go to researcher profile for Prithviraj  Ammanabrolu\" data-bi-type=\"byline author\" data-bi-cN=\"Prithviraj  Ammanabrolu\">Prithviraj  Ammanabrolu<\/a>","formattedDate":"January 16, 2020","formattedExcerpt":"You\u2019re in a field. In front of you, there\u2019s a white house. The door is boarded shut. The immediate challenge\u2014investigate the house. The game\u2014Zork I: The Great Underground Empire, a treasure-seeking adventure in which you\u2019ll encounter monsters, a thief, and other interesting characters along the&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/631245","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=631245"}],"version-history":[{"count":10,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/631245\/revisions"}],"predecessor-version":[{"id":631719,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/631245\/revisions\/631719"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/636015"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=631245"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=631245"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=631245"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=631245"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=631245"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=631245"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=631245"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=631245"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=631245"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=631245"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=631245"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}