{"id":938229,"date":"2023-05-04T10:00:00","date_gmt":"2023-05-04T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=938229"},"modified":"2023-05-16T12:37:27","modified_gmt":"2023-05-16T19:37:27","slug":"using-generative-ai-to-imitate-human-behavior","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/using-generative-ai-to-imitate-human-behavior\/","title":{"rendered":"Using generative AI to imitate human behavior"},"content":{"rendered":"\n<p class=\"has-text-align-center h6\"><em>This research was accepted by the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Ficlr.cc%2FConferences%2F2023&data=05%7C01%7Cv-amelfi%40microsoft.com%7C4f8d3ad02e2e4d7783fe08db4a7e2690%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638185677904637846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZWTF5GUK8hIi3OR0tBRtSucX4Xd309Bl%2BBlqgERCbT0%3D&reserved=0\" target=\"_blank\" rel=\"noopener noreferrer\">2023 International Conference on Learning Representations (ICLR)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which is dedicated to the advancement of the branch of artificial intelligence generally referred to as deep learning.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"646\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig1_Our-Method.jpg\" alt=\"An overview of our method, providing a side-by-side comparison of text-to-image diffusion, with observation-to-action diffusion. On the right are diagrams of the different denoising architectures tested, as well an illustration of the sampling schemes explored.\" class=\"wp-image-938259\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig1_Our-Method.jpg 1920w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig1_Our-Method-300x101.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig1_Our-Method-1024x345.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig1_Our-Method-768x258.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig1_Our-Method-1536x517.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig1_Our-Method-240x81.jpg 240w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><figcaption class=\"wp-element-caption\">Figure 1: Overview of our method.<\/figcaption><\/figure>\n\n\n\n<p>Diffusion models have emerged as a powerful class of generative AI models. They have been used to generate photorealistic images and short videos, compose music, and synthesize speech. And their uses don\u2019t stop there. In our new paper, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/imitating-human-behaviour-with-diffusion-models\/\">Imitating Human Behaviour with Diffusion Models<\/a>, we explore how they can be used to imitate human behavior in interactive environments.<\/p>\n\n\n\n<p>This capability is valuable in many applications. For instance, it could help automate repetitive manipulation tasks in robotics, or it could be used to create humanlike AI in video games, which could lead to exciting new game experiences\u2014a goal particularly dear to <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/game-intelligence\/\">our team<\/a>.<\/p>\n\n\n\n<p>We follow a machine learning paradigm known as <em>imitation learning<\/em> (more specifically <em>behavior cloning<\/em>). In this paradigm, we are provided with a dataset containing observations a person saw, and the actions they took, when acting in an environment, which we would like an AI agent to mimic. In interactive environments, at each time step, an observation \\( o_t \\) is received (e.g. a screenshot of a video game), and an action \\( a_t \\) is then selected (e.g. the mouse movement). With this dataset of many \\( o \\)\u2019s and \\( a \\)\u2019s performed by some demonstrator, a model \\( \\pi \\) could try to learn this mapping of observation-to-action, \\( \\pi(o) \\to a \\).<\/p>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1144028\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">PODCAST SERIES<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/the-ai-revolution-in-medicine-revisited\/\" aria-label=\"The AI Revolution in Medicine, Revisited\" data-bi-cN=\"The AI Revolution in Medicine, Revisited\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/Episode7-PeterBillSebastien-AIRevolution_Hero_Feature_River_No_Text_1400x788.jpg\" alt=\"Illustrated headshot of Bill Gates, Peter Lee, and S\u00e9bastien Bubeck\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">The AI Revolution in Medicine, Revisited<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"the-ai-revolution-in-medicine-revisited\" class=\"large\">Join Microsoft\u2019s Peter Lee on a journey to discover how AI is impacting healthcare and what it means for the future of medicine.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/the-ai-revolution-in-medicine-revisited\/\" aria-describedby=\"the-ai-revolution-in-medicine-revisited\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"The AI Revolution in Medicine, Revisited\" target=\"_blank\">\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<p>When the actions are continuous, training a model to learn this mapping introduces some interesting challenges. In particular, what loss function should be used? A simple choice is mean squared error, as often used in supervised regression tasks. In an interactive environment, this objective encourages an agent to learn the <em>average<\/em> of all the behaviors in the dataset.<\/p>\n\n\n\n<p>If the goal of the application is to generate diverse human behaviors, the average might not be very useful. After all, humans are stochastic (they act on whims) and multimodal creatures (different humans might make different decisions). Figure 2 depicts the failure of mean squared error to mimic the true action distribution (marked in yellow) when it is multimodal. It also includes several other popular choices for the loss function when doing behavior cloning.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1503\" height=\"231\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/intro_figure-2.jpg\" alt=\"This toy example (based on an arcade claw game) shows an action space with two continuous action dimensions. It shows that popular choices of behavioral cloning loss fail to capture the true distribution, but diffusion models offer a good approximation.\" class=\"wp-image-938265\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/intro_figure-2.jpg 1503w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/intro_figure-2-300x46.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/intro_figure-2-1024x157.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/intro_figure-2-768x118.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/intro_figure-2-240x37.jpg 240w\" sizes=\"auto, (max-width: 1503px) 100vw, 1503px\" \/><figcaption class=\"wp-element-caption\">Figure 2: This toy example (based on an arcade claw game) shows an action space with two continuous action dimensions. Here the demonstration distribution is marked in yellow\u2014it is both multimodal and has correlations between action dimensions. Diffusion models offer a good imitation of the full diversity in the dataset.<\/figcaption><\/figure>\n\n\n\n<p>Ideally, we\u2019d like our models to learn the full variety of human behaviors. And this is where generative models help. Diffusion models are a specific class of generative model that are both stable to train and easy to sample from. They have been very successful in the text-to-image domain, which shares this one-to-many challenge\u2014a single text caption might be matched by multiple different images.<\/p>\n\n\n\n<p>Our work adapts ideas that have been developed for text-to-image diffusion models, to this new paradigm of observation-to-action diffusion. Figure 1 highlights some differences. One obvious point is that the object we are generating is now a low-dimensional action vector (rather than an image). This calls for a new design for the denoising network architecture. In image generation, heavy convolutional <em>U-Nets<\/em> are in vogue, but these are less applicable for low-dimensional vectors. Instead, we innovated and tested three different architectures shown in Figure 1.<\/p>\n\n\n\n<p>In observation-to-action models, sampling a single bad action during an episode can throw an agent off course, and hence we were motivated to develop sampling schemes that would more reliably return good action samples (also shown in Figure 1). This problem is less severe in text-to-image models, since users often have the luxury of selecting a single image from among several generated samples and ignoring any bad images. Figure 3 shows an example of this, where a user might cherry-pick their favorite, while ignoring the one with nonsensical text.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"684\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig3-arcade_everative-AI.png\" alt=\"Four samples from a text-to-image diffusion model from Bing using the prompt \u201cA cartoon style picture of people playing with arcade claw machine\u201d. Some of the samples are good quality, some contain errors, for example the text in one image is nonsensical.\" class=\"wp-image-938295\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig3-arcade_everative-AI.png 633w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig3-arcade_everative-AI-278x300.png 278w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/Fig3-arcade_everative-AI-167x180.png 167w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><figcaption class=\"wp-element-caption\">Figure 3: Four samples from a text-to-image diffusion model from Bing (note this is not our own work), using the prompt \u201cA cartoon style picture of people playing with arcade claw machine\u201d.<\/figcaption><\/figure>\n\n\n\n<p>We tested our diffusion agents in two different environments. The first, a simulated kitchen environment, is a challenging high-dimensional continuous control problem where a robotic arm must manipulate various objects. The demonstration dataset is collected from a variety of humans performing various tasks in differing orders. Hence there is rich multimodality in the dataset.<\/p>\n\n\n\n<p>We found that diffusion agents outperformed baselines in two aspects. 1) The diversity of behaviors they learned were broader, and closer to the human demonstrations. 2) The rate of task completion (a proxy for reward) was better.<\/p>\n\n\n\n<p>The videos below highlight the ability of diffusion to capture multimodal behavior\u2013starting from the same initial conditions, we roll out the diffusion agent eight times. Each time it selects a different sequence of tasks to complete.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-1.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938682\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-2.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938685\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-3.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938688\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-4.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938691\"\/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-5.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938694\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-6.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938697\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-7.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938700\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"204\" height=\"205\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/robot-arm-8.gif\" alt=\"A short clip showing a robotic arm interacting with a kitchen environment performing a specific task.\" class=\"wp-image-938703\"\/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>The second environment tested was a modern 3D video game, Counter-strike. We refer interested readers to the paper for results.<\/p>\n\n\n\n<p>In summary, our work has demonstrated how exciting recent advances in generative modeling can be leveraged to build agents that can behave in humanlike ways in interactive environments. We\u2019re excited to continue exploring this direction \u2013 watch this space for future work.<\/p>\n\n\n\n<p>For more detail on our work, please see our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2301.10677\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/Imitating-Human-Behaviour-w-Diffusion\">code repo<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Diffusion models have been used to generate photorealistic images and short videos, compose music, and synthesize speech. In a new paper, Microsoft Researchers explore how they can be used to imitate human behavior in interactive environments. <\/p>\n","protected":false},"author":42183,"featured_media":938712,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Tim Pearce","user_id":"41719"},{"type":"user_nicename","value":"Tabish Rashid","user_id":"41784"},{"type":"user_nicename","value":"Anssi Kanervisto","user_id":"41689"},{"type":"user_nicename","value":"Dave Bignell","user_id":"38320"},{"type":"user_nicename","value":"Mingfei Sun","user_id":"39474"},{"type":"user_nicename","value":"Raluca Georgescu","user_id":"37392"},{"type":"user_nicename","value":"Sergio Valcarcel Macua","user_id":"42507"},{"type":"user_nicename","value":"Shanzheng Tan","user_id":"41551"},{"type":"user_nicename","value":"Ida Momennejad","user_id":"39832"},{"type":"user_nicename","value":"Katja Hofmann","user_id":"32468"},{"type":"user_nicename","value":"Sam Devlin","user_id":"37550"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-938229","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[583324,1142579],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Dave Bignell","user_id":38320,"display_name":"Dave Bignell","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dabignel\/\" aria-label=\"Visit the profile page for Dave Bignell\">Dave Bignell<\/a>","is_active":false,"last_first":"Bignell, Dave","people_section":0,"alias":"dabignel"},{"type":"user_nicename","value":"Raluca Georgescu","user_id":37392,"display_name":"Raluca Stevenson","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rageorg\/\" aria-label=\"Visit the profile page for Raluca Stevenson\">Raluca Stevenson<\/a>","is_active":false,"last_first":"Stevenson, Raluca","people_section":0,"alias":"rageorg"},{"type":"user_nicename","value":"Sergio Valcarcel Macua","user_id":42507,"display_name":"Sergio Valcarcel Macua","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sergiov\/\" aria-label=\"Visit the profile page for Sergio Valcarcel Macua\">Sergio Valcarcel Macua<\/a>","is_active":false,"last_first":"Valcarcel Macua, Sergio","people_section":0,"alias":"sergiov"},{"type":"user_nicename","value":"Ida Momennejad","user_id":39832,"display_name":"Ida Momennejad","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/idamo\/\" aria-label=\"Visit the profile page for Ida Momennejad\">Ida Momennejad<\/a>","is_active":false,"last_first":"Momennejad, Ida","people_section":0,"alias":"idamo"},{"type":"user_nicename","value":"Katja Hofmann","user_id":32468,"display_name":"Katja Hofmann","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kahofman\/\" aria-label=\"Visit the profile page for Katja Hofmann\">Katja Hofmann<\/a>","is_active":false,"last_first":"Hofmann, Katja","people_section":0,"alias":"kahofman"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-960x540.jpg\" class=\"img-object-cover\" alt=\"diagram\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ICLR_Showcase_Stagnant_Hero_1400x788.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"May 4, 2023","formattedExcerpt":"Diffusion models have been used to generate photorealistic images and short videos, compose music, and synthesize speech. In a new paper, Microsoft Researchers explore how they can be used to imitate human behavior in interactive environments.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/938229","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=938229"}],"version-history":[{"count":24,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/938229\/revisions"}],"predecessor-version":[{"id":938982,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/938229\/revisions\/938982"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/938712"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=938229"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=938229"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=938229"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=938229"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=938229"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=938229"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=938229"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=938229"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=938229"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=938229"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=938229"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}