{"id":455559,"date":"2017-01-09T00:00:55","date_gmt":"2017-01-09T08:00:55","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=455559"},"modified":"2018-01-23T20:02:19","modified_gmt":"2018-01-24T04:02:19","slug":"creating-curious-machines-building-information-seeking-agents","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/creating-curious-machines-building-information-seeking-agents\/","title":{"rendered":"Creating curious machines: Building information-seeking agents"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>Humans have an innate desire to know and understand. From a child learning to ride a bike to an adult gaining skills in an online course, we constantly absorb information from our environment through interaction. Motivated by this observation, we\u2019ve developed a suite of tasks that teach artificial agents how to seek information actively, by asking questions. We\u2019ve also designed a deep neural agent that learns to accomplish these tasks through efficient information-seeking behavior. Such behavior is a vital research step towards Artificial General Intelligence.<\/p>\n<h2>Asking the right questions<\/h2>\n<p>Let\u2019s say you\u2019re at a dinner party with friends and you decide to play 20 Questions. It\u2019s your turn and you choose \u2018cat\u2019 as the thing for others to guess. They begin by asking broad questions: \u201cIs it alive?\u201d, \u201cIs it a person?\u201d, \u201cIs it an animal?\u201d, \u201cDoes it live underwater?\u201d. The person who correctly identifies the item first is the winner, so your friends are not just trying to get the right answer, they\u2019re trying to do so with as few questions as possible. Based on your simple yes-or-no responses, your friends can quickly narrow down the set of viable items until one correctly guesses \u2018cat\u2019.<\/p>\n<p>This example demonstrates the iterative nature of information seeking: the information currently sought must be intelligently conditioned on the information already acquired. To be effective, an information-seeking agent must in some sense understand the state of its current knowledge. It must know what it knows, and how to bridge the gap between what it knows and what it needs to know.<\/p>\n<p>The 20 Questions example also highlights how communication necessarily takes place over a restricted channel: each answer is a simple \u2018yes\u2019 or \u2018no\u2019 (conveying just one bit of information), and the number of questions is limited. Real-world information seeking is typically restricted in a similar sense &#8212; we communicate via finite languages, over limited amounts of time. Consider searching online to choose a gift for a friend. Perhaps you start broadly &#8212; guided loosely by age, gender and budget &#8212; then hone in based on specific interests and recommendations.<\/p>\n<p>Because of its fundamental role in intelligent behavior, information seeking has been studied from a variety of perspectives, including cognitive science, psychology, neuroscience, and machine learning. In neuroscience, for instance, information-seeking strategies are often explained by biases toward novel, surprising, or uncertain events (Ranganath & Rainer, 2003). Information seeking is a key component in formal notions of fun and creativity (Schmidhuber, 2010), and intrinsic motivation (Oudeyer and Kaplan, 2007). It is also closely related to the concept of attention, which improves efficiency by ignoring irrelevant features (Mnih et al., 2014) and may be considered a strategy for information seeking.<\/p>\n<h2>New tasks for exploring information seeking<\/h2>\n<p>Researchers have used different tools and systems to help train intelligent agents, from datasets through to bespoke learning environments. The use of games like chess, Go, and the Atari suite has been incredibly fruitful in training intelligent agents. Similarly, many of the games that humans enjoy seem expressly designed to train efficient information seeking, or at least to exploit our joy in exercising this skill.<\/p>\n<p>Thus motivated, we designed a suite of tasks to train and evaluate information-seeking behavior. Three of these tasks are demonstrated here (see our paper for more details):<\/p>\n\t<iframe\n\t\tsrc=\"https:\/\/www.youtube.com\/embed\/3bSquT1zqj8\"\n\t\twidth=\"640\"\n\t\theight=\"360\"\n\t\taria-label=\"\"\n\t\tallowfullscreen=\"true\">\n\t<\/iframe>\n\t\n<p>The highlighted tasks were Hangman, Face Challenge, and War Boat.\u00a0Each of these tasks has a distinct mode of play and unique rules and objectives. Each requires the ability to seek information iteratively based on an agent\u2019s current \u201cpicture\u201d of the world. The tasks are illustrated in the gifs below.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-455586\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/01\/Maluuba-AGI-Hangman.gif\" alt=\"Maluuba+-+AGI+-+Hangman\" width=\"480\" height=\"270\" \/>Hangman:\u00a0The classic game where an agent must identify a phrase within a set number of turns by guessing letters of the alphabet.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-455640\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/01\/1484068118530.gif\" alt=\"Hangman\" width=\"480\" height=\"270\" \/><\/p>\n<p>Face challenge: An agent must determine the answer to questions like \u201cIs this person wearing a hat?\u201d, or \u201cDoes this person have a moustache?\u201d by peeking at small chunks of an occluded portrait.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-455646\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/01\/warboat.gif\" alt=\"War boat\" width=\"480\" height=\"270\" \/><\/p>\n<p>War Boat: An agent aims to sink an opponent\u2019s naval fleet, which is randomly positioned on a hidden grid. Correctly guessing points where a boat is located means that the ship is \u2018hit\u2019, and with enough hits the boat sinks.<\/p>\n<h2>Training models to seek information<\/h2>\n<p>The actions agents perform in our tasks can be interpreted as questions asked of the environment, e.g. \u201cDoes this phrase contain the letter \u2018a\u2019?\u201d or \u201cWhat does this block of pixels look like?\u201d. To succeed, an agent must learn to ask useful questions and assimilate the information it obtains.<\/p>\n<p>We developed a model that can be trained to do just that. At each step in completing a task, the model asks what it believes to be the most useful question, receives a response from the environment, and integrates that response with its existing knowledge. The model is a deep neural network that we trained through a combination of reinforcement learning techniques (specifically: Generalized Advantage Estimation, Schulman et al. 2016) and backpropagation. See the paper for full details.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-455655\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/01\/Top-down.png\" alt=\"Top down and bottom up networks\" width=\"548\" height=\"344\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/01\/Top-down.png 548w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/01\/Top-down-300x188.png 300w\" sizes=\"auto, (max-width: 548px) 100vw, 548px\" \/>During training, the agent maximizes a reward which combines task-specific extrinsic rewards and a task-agnostic intrinsic reward. The extrinsic rewards encourage the agent to achieve its goal using as few questions as possible. The intrinsic reward encourages the model to ask questions which provide the most new information about the environment. Specifically, we reward each question according to how much its answer increases similarity between model\u2019s belief about the world and the actual state of the world. Thus, the agent learns to efficiently form an accurate internal picture of its environment.<\/p>\n<h2>Towards artificial general intelligence<\/h2>\n<p>As the demo shows, our methods produce agents that succeed across a broad range of tasks. The same approach can be applied to language, image, and strategy domains. In our tasks, the trained agents exhibit interpretable, intelligent information-seeking behavior, often performing at super-human levels.<\/p>\n<p>We believe that information seeking plays a fundamental role in General Intelligence. Our present work is a small step towards this grander goal.<\/p>\n<p>Read the paper<\/p>\n<p><strong>References<\/strong><\/p>\n<ul>\n<li>Charan Ranganath and Gregor Rainer. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/1998127479\" target=\"_blank\" rel=\"noopener noreferrer\">Neural mechanisms for detecting and remembering novel events,<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Nature Reviews Neuroscience, 4(3):193\u2013202, 2003.<\/li>\n<li>J\u00fcrgen Schmidhuber. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/people.idsia.ch\/~juergen\/creativity.html\" target=\"_blank\" rel=\"noopener noreferrer\">Formal theory of creativity, fun, and intrinsic motivation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230\u2013247, 2010.<\/li>\n<li>Pierre-Yves Oudeyer, Fr\u00e9d\u00e9ric Kaplan, and Verena V. Hafner. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/2101524054\" target=\"_blank\" rel=\"noopener noreferrer\">Intrinsic motivation systems for autonomous mental development<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, 2007.<\/li>\n<li>Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/papers.nips.cc\/paper\/5542-recurrent-models-of-visual-attention\" target=\"_blank\" rel=\"noopener noreferrer\">Recurrent models of visual attention,<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in Advances in Neural Information Processing Systems (NIPS), pp. 2204\u20132212, 2014.<\/li>\n<li>John Schulman, Philipp Moritz, Sergey Levine, Michael I Jordan, and Pieter Abbeel. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/1506.02438\" target=\"_blank\" rel=\"noopener noreferrer\">High-dimensional continuous control using generalized advantage estimation,<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in International Conference on Learning Representations (ICLR), 2016.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Humans have an innate desire to know and understand. From a child learning to ride a bike to an adult gaining skills in an online course, we constantly absorb information from our environment through interaction. Motivated by this observation, we\u2019ve developed a suite of tasks that teach artificial agents how to seek information actively, [&hellip;]<\/p>\n","protected":false},"author":37173,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[241770],"tags":[187359],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-455559","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","tag-artificial-intelligence","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[437514],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"January 9, 2017","formattedExcerpt":"&nbsp; Humans have an innate desire to know and understand. From a child learning to ride a bike to an adult gaining skills in an online course, we constantly absorb information from our environment through interaction. Motivated by this observation, we\u2019ve developed a suite of&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/455559","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37173"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=455559"}],"version-history":[{"count":18,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/455559\/revisions"}],"predecessor-version":[{"id":456270,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/455559\/revisions\/456270"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=455559"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=455559"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=455559"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=455559"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=455559"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=455559"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=455559"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=455559"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=455559"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=455559"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=455559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}