{"id":1160901,"date":"2026-02-05T09:00:00","date_gmt":"2026-02-05T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1160901"},"modified":"2026-02-03T10:29:20","modified_gmt":"2026-02-03T18:29:20","slug":"rethinking-imitation-learning-with-predictive-inverse-dynamics-models","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/rethinking-imitation-learning-with-predictive-inverse-dynamics-models\/","title":{"rendered":"Rethinking\u00a0imitation\u00a0learning\u00a0with Predictive Inverse Dynamics Models"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New.jpg\" alt=\"Smart Replay - flowchart diagram showing the flow between Encoder, State Predictor, and Policy\" class=\"wp-image-1161128\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<div style=\"padding-bottom:0; padding-top:0\" class=\"wp-block-msr-immersive-section alignfull row wp-block-msr-immersive-section\">\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wp-block-msr-immersive-section__inner wp-block-msr-immersive-section__inner--narrow\">\n\t\t\t<div class=\"wp-block-columns mb-10 pb-1 pr-1 is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\" style=\"box-shadow:var(--wp--preset--shadow--outlined)\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h2 class=\"wp-block-heading h3\" id=\"at-a-glance\">At a glance<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Imitation learning becomes easier when an AI&nbsp;agent&nbsp;understands why an action is taken.<\/li>\n\n\n\n<li>Predictive Inverse Dynamics Models (PIDMs)&nbsp;predict&nbsp;plausible future states,&nbsp;clarifying the direction of behavior during imitation&nbsp;learning.<\/li>\n\n\n\n<li>Even imperfect predictions reduce ambiguity,&nbsp;making&nbsp;it clearer which action makes sense&nbsp;in the moment.<\/li>\n\n\n\n<li>This makes PIDMs far more data\u2011efficient than traditional approaches.<\/li>\n<\/ul>\n<\/div>\n<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n<p>Imitation&nbsp;learning&nbsp;teaches&nbsp;AI agents by example: show the agent recordings of how people perform a task and let it&nbsp;infer&nbsp;what to do.&nbsp;The&nbsp;most common&nbsp;approach,&nbsp;Behavior Cloning&nbsp;(BC),&nbsp;frames this as a simple question: \u201cGiven the current state&nbsp;of the environment, what action&nbsp;would&nbsp;an expert take?\u201d<\/p>\n\n\n\n<p>In practice, this is done through supervised learning, where the states serve as inputs and expert actions as outputs. While simple in principle, BC often requires large demonstration datasets to account for the natural variability in human behavior, but collecting such datasets can be costly and difficult in real-world settings.<\/p>\n\n\n\n<p>Predictive Inverse Dynamics Models (PIDMs) offer a different take on imitation learning by changing how agents interpret human behavior. Instead of directly mapping states to actions, PIDMs break down the problem into two subproblems: predicting what should happen next and inferring an appropriate action to go from the current state to the predicted future state. While PIDMs often outperform BC, it has not been clear why they work so well, motivating a closer look at the mechanisms behind their performance.<\/p>\n\n\n\n<p>In the paper, \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/when-does-predictive-inverse-dynamics-outperform-behavior-cloning\/\">When does predictive inverse dynamics outperform behavior cloning?<\/a>\u201d we show how this two-stage approach enables PIDMs to learn effective policies from far fewer demonstrations than BC. By grounding the selection process in a plausible future, PIDMs provide a clearer basis for choosing an action&nbsp;during inference. In practice, this can mean achieving comparable performance with as few as one-fifth the demonstrations required by BC, even when predictions are imperfect.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1009\" height=\"658\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/SmartReplay_FIG1.png\" alt=\"Figure\u202f1.\u202fBC vs. PIDM\u202farchitectures.\u00a0(Top)\u202fBehavior\u00a0Cloning learns\u00a0how to perform\u00a0a direct mapping from the current state to an action.\u202f(Bottom)\u00a0PIDMs add a\u202fstate predictor that\u202fpredicts\u202ffuture\u00a0states. They\u00a0then use an inverse dynamics model to\u202fpredict\u202fthe action\u00a0required\u00a0to move from the current state towards\u202fthat future state. Both approaches\u202fshare a common latent representation through a shared state\u202fencoder.\" class=\"wp-image-1161185\" style=\"width:600px\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/SmartReplay_FIG1.png 1009w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/SmartReplay_FIG1-300x196.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/SmartReplay_FIG1-768x501.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/02\/SmartReplay_FIG1-240x157.png 240w\" sizes=\"auto, (max-width: 1009px) 100vw, 1009px\" \/><figcaption class=\"wp-element-caption\">Figure\u202f1.\u202fBC vs. PIDM\u202farchitectures.&nbsp;(Top)\u202fBehavior&nbsp;Cloning learns&nbsp;how to perform&nbsp;a direct mapping from the current state to an action.\u202f(Bottom)&nbsp;PIDMs add a\u202fstate predictor that\u202fpredicts\u202ffuture&nbsp;states. They&nbsp;then use an inverse dynamics model to\u202fpredict\u202fthe action&nbsp;required&nbsp;to move from the current state towards\u202fthat future state. Both approaches\u202fshare a common latent representation through a shared state\u202fencoder.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-pidms-rethink-imitation\">How PIDMs rethink imitation<\/h2>\n\n\n\n<p>PIDMs\u2019 approach to imitation learning consists of two core elements: a model that forecasts plausible future states, and an inverse dynamics model (IDM) that predicts the action needed to move from the present state toward that future. Instead of asking, \u201cWhat action would an expert take?\u201d PIDMs effectively ask, \u201cWhat would an expert try to achieve, and what action would lead to it?\u201d This shift turns the information in the current observation (e.g., video frame) into a coherent sense of direction, reducing ambiguity about intent and making action prediction easier.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1141385\">\n\t\t\n\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/ai.azure.com\/labs\" aria-label=\"Azure AI Foundry Labs\" data-bi-cN=\"Azure AI Foundry Labs\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/Azure-AI-Foundry_1600x900.jpg\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Azure AI Foundry Labs<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"azure-ai-foundry-labs\" class=\"large\">Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/ai.azure.com\/labs\" aria-describedby=\"azure-ai-foundry-labs\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Azure AI Foundry Labs\" target=\"_blank\">\n\t\t\t\t\t\t\tAzure AI Foundry\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"real-world-validation-in-a-3d-gameplay-environment\">Real-world validation in a 3D gameplay environment<\/h2>\n\n\n\n<p>To\u00a0evaluate\u00a0PIDMs\u00a0under realistic conditions,\u00a0we trained\u00a0agents on human gameplay demonstrations in a visually rich video game. These conditions\u00a0include\u00a0operating\u00a0directly from raw video\u00a0input, interacting with\u00a0a complex 3D\u00a0environment in real time at 30\u202fframes\u00a0per\u00a0second, and\u00a0handling\u00a0visual artifacts and unpredictable system delays.\u00a0\u00a0<\/p>\n\n\n\n<p>The agents ran from beginning to end, taking video frames as input and continuously deciding which buttons to press and how to move the joysticks. Instead of relying on a hand-coded set of game variables and rules, the model worked directly from visual input, using past examples to predict what comes next and choosing actions that moved play in that direction.<\/p>\n\n\n\n<p>We ran all experiments on a cloud gaming platform, which introduced additional delays and visual distortions. Despite these challenges, the PIDM agents consistently matched human patterns of play and achieved high success rates across tasks, as shown in Video 1 below and Videos 2 and 3 in the appendix.<\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"Jfjt_k6Pw1k\" data-poster=\"https:\/\/img.youtube.com\/vi\/Jfjt_k6Pw1k\/maxresdefault.jpg\"><iframe aria-hidden=\"true\" tabindex=\"-1\" title=\"Smart Replay demo: exercise (video 1)\" width=\"500\" height=\"281\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/Jfjt_k6Pw1k?feature=oembed&rel=0&enablejsapi=1\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div><\/div>\n<\/div><figcaption class=\"wp-element-caption\">Video 1. A player&nbsp;(left)&nbsp;and a PIDM agent&nbsp;(right)&nbsp;side by side playing the game&nbsp;<em>Bleeding Edge<\/em>.&nbsp;Both&nbsp;navigate the same trajectory,&nbsp;jumping over obstacles and engaging&nbsp;with&nbsp;nonplayer&nbsp;characters. Despite&nbsp;network delays, the&nbsp;agent closely matches the player&#8217;s timing and&nbsp;movement&nbsp;in real time.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"why-and-when-pidms-outperform-bc\">Why and when PIDMs outperform BC<\/h2>\n\n\n\n<p>Of course, AI agents do not have access to future outcomes. They can only generate predictions based on available data, and those predictions are sometimes wrong. This creates a central trade\u2011off for PIDMs.<\/p>\n\n\n\n<p>On one hand, anticipating where the agent should be heading can clarify what action makes sense in the present. Knowing the intended direction helps narrow an otherwise ambiguous choice. On the other hand, inaccurate predictions can occasionally steer the model toward the wrong action.<\/p>\n\n\n\n<p>The key insight is that these effects are not symmetric. While prediction errors introduce some risk, reducing ambiguity in the present often matters more. Our theoretical analysis shows that even with imperfect predictions, PIDMs outperform BC as long as the prediction error remains modest. If future states were known perfectly, PIDMs would outperform BC outright.<\/p>\n\n\n\n<p>In practice, this means that clarifying intent often matters more than accurately predicting the future. That advantage is most evident in the situations where BC struggles: where human behavior varies and actions are driven by underlying goals rather than by what is immediately visible on the screen.<\/p>\n\n\n\n<p>BC requires many demonstrations because each example is noisy and open to multiple interpretations. PIDMs, by contrast, sharpen each demonstration by linking actions to the future states they aim to reach. As a result, PIDMs can learn effective action strategies from far fewer examples.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"evaluation\">Evaluation<\/h2>\n\n\n\n<p>To test these ideas under realistic conditions, we designed a sequence of experiments that begins with a simple, interpretable 2D environment (Video 4 in the appendix) and culminates in a complex 3D video game. We trained both BC and PIDM on very small datasets, ranging from one to fifty demonstrations in the 2D environment and from five to thirty for the 3D video game. Across all tasks, PIDM reached high success rates with far fewer demonstrations than BC.<\/p>\n\n\n\n<p>In the 2D setting, BC needed two to five times more data to match PIDM\u2019s performance (Figure 2). In the 3D game, BC needed 66% more data to achieve comparable results (Video 5 in the appendix).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1166\" height=\"871\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay_blog_Fig2a-d.png\" alt=\"Figure 2. Performance gains in the 2D environment. As the number of training demonstrations increases, PIDM consistently achieves higher success rates than BC across all four tasks. Curves show mean performance, with shading indicating variability across 20 experiments for reproducibility.\" class=\"wp-image-1161012\" style=\"object-fit:cover\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay_blog_Fig2a-d.png 1166w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay_blog_Fig2a-d-300x224.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay_blog_Fig2a-d-1024x765.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay_blog_Fig2a-d-768x574.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay_blog_Fig2a-d-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay_blog_Fig2a-d-240x180.png 240w\" sizes=\"auto, (max-width: 1166px) 100vw, 1166px\" \/><figcaption class=\"wp-element-caption\">Figure 2. Performance gains in the 2D environment. As the number of training demonstrations increases, PIDM consistently achieves higher success rates than BC across all four tasks. Curves show mean performance, with shading indicating variability across 20 experiments for reproducibility. <\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"takeaway-intent-matters-in-imitation-learning\">Takeaway: Intent matters in imitation learning<\/h2>\n\n\n\n<p>The main message of our investigation is simple: imitation becomes easier when intent is made explicit. Predicting a plausible future, even an imperfect one, helps resolve ambiguity about which action makes sense right now, much like driving more confidently in the fog when the driver already knows where the road is headed. PIDM shifts imitation learning from pure copying toward goal-oriented action.<\/p>\n\n\n\n<p>This approach has limits. If predictions of future states become too unreliable, they can mislead the model about the intended next move. In those cases, the added uncertainty can outweigh the benefit of reduced ambiguity, causing PIDM to underperform BC.<\/p>\n\n\n\n<p>But when predictions are reasonably accurate, reframing action prediction as \u201c<em>How do I get there from here<\/em>?\u201d helps explain why learning from small, messy human datasets can be surprisingly effective. In settings where data is expensive and demonstrations are limited, that shift in perspective can make a meaningful difference.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"appendix-visualizations-and-results-videos\">Appendix: Visualizations and results (videos)<\/h2>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading h4\" id=\"a-player-a-naive-action-replay-baseline-and-a-pidm-agent-playing-bleeding-edge-1\">A player, a na\u00efve action-replay baseline, and a PIDM agent playing <em>Bleeding Edge<\/em><\/h3>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"x1WdGiX4QYk\" data-poster=\"https:\/\/img.youtube.com\/vi\/x1WdGiX4QYk\/maxresdefault.jpg\"><iframe aria-hidden=\"true\" tabindex=\"-1\" title=\"Smart Replay demo: Player and PIDM performing a complex task (video 2)\" width=\"500\" height=\"281\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/x1WdGiX4QYk?feature=oembed&rel=0&enablejsapi=1\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div><\/div>\n<\/div><figcaption class=\"wp-element-caption\">Video&nbsp;2. (Left)&nbsp;The player completes the task under normal conditions. (Middle)&nbsp;The baseline replays the recorded actions at their original timestamps, which initially appears to work. Because the game runs on a cloud gaming platform, however, random network delays quickly push the replay&nbsp;out of sync, causing the trajectory to fail. (Right) Under the same conditions, the PIDM agent behaves differently. Instead of naively replaying actions, it continuously interprets visual input, predicts how the behavior is likely to unfold, and adapts its actions in real time. This allows it to correct delays, recover from deviations, and successfully reproduce the task in settings where na\u00efve replay inevitably fails.<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading h4\" id=\"a-player-and-a-pidm-agent-performing-a-complex-task-in-bleeding-edge\">A player and a PIDM agent&nbsp;performing a complex task in&nbsp;<em>Bleeding Edge<\/em><\/h3>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"gUbIsAcsW6w\" data-poster=\"https:\/\/img.youtube.com\/vi\/gUbIsAcsW6w\/maxresdefault.jpg\"><iframe aria-hidden=\"true\" tabindex=\"-1\" title=\"Smart Replay demo: Player, replay baseline, and PIDM (video 3)\" width=\"500\" height=\"281\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/gUbIsAcsW6w?feature=oembed&rel=0&enablejsapi=1\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div><\/div>\n<\/div><figcaption class=\"wp-element-caption\">Video&nbsp;3.&nbsp;In this video, the task&nbsp;exhibits&nbsp;strong partial observability: correct behavior depends on whether a location is being visited for the first or second time. For example,&nbsp;in the first encounter, the agent proceeds straight up the ramp; on the second, it turns right toward the bridge. Similarly, it may jump over a box on the first pass but walk around it on the second. The PIDM agent reproduces this trajectory reliably, using coarse future guidance to select actions in the correct direction.<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading h4\" id=\"visualization-of-the-2d-navigation-environment\">Visualization of the 2D navigation environment<\/h3>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"PfU2gMXqQ8c\" data-poster=\"https:\/\/img.youtube.com\/vi\/PfU2gMXqQ8c\/maxresdefault.jpg\"><iframe aria-hidden=\"true\" tabindex=\"-1\" title=\"Smart Replay demo: 2D navigation task visualization (video 4)\" width=\"500\" height=\"281\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/PfU2gMXqQ8c?feature=oembed&rel=0&enablejsapi=1\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div><\/div>\n<\/div><figcaption class=\"wp-element-caption\">Video&nbsp;4.&nbsp;These&nbsp;videos show ten demonstrations for each of four tasks: Four Room, Zigzag, Maze, and Multiroom. In all cases, the setup is the same: the character (blue box) moves through the environment and must reach a sequence of goals (red squares).&nbsp;The overlaid trajectories visualize the paths the player took; the models never see these paths. Instead, they observe only their character\u2019s current location, the position of all goals, and whether each goal has already been reached. Because these demonstrations come from real players, no two paths are identical: players pause, take detours, or correct small mistakes along the way. That natural variability is exactly what the models must learn to handle.<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"wp-block-heading h4\" id=\"pidm-vs-bc-in-a-3d-environment\">PIDM vs. BC in a 3D&nbsp;environment<\/h3>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"iXgDXSVPJxY\" data-poster=\"https:\/\/img.youtube.com\/vi\/iXgDXSVPJxY\/maxresdefault.jpg\"><iframe aria-hidden=\"true\" tabindex=\"-1\" title=\"Smart Replay demo: PIDM vs. BC in a 3D environment (video 5)\" width=\"500\" height=\"281\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/iXgDXSVPJxY?feature=oembed&rel=0&enablejsapi=1\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div><\/div>\n<\/div><figcaption class=\"wp-element-caption\">Video&nbsp;5. The PIDM agent achieves an 85% success rate with only fifteen demonstrations used in training. The BC agent struggles to stay on track and levels off around 60%.&nbsp;The contrast illustrates how differently the two approaches perform when training data is limited.<\/figcaption><\/figure>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>This research looks at why Predictive Inverse Dynamics Models often outperform standard Behavior Cloning in imitation learning. By using simple predictions of what happens next, PIDMs reduce ambiguity and learn from far fewer demonstrations.<\/p>\n","protected":false},"author":43868,"featured_media":1161128,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Pallavi Choudhury","user_id":"33184"},{"type":"user_nicename","value":"Lukas Sch&auml;fer","user_id":"43602"},{"type":"user_nicename","value":"Chris Lovett","user_id":"36027"},{"type":"user_nicename","value":"Katja Hofmann","user_id":"32468"},{"type":"user_nicename","value":"Sergio Valcarcel Macua","user_id":"42507"}],"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1160901","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561,199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[583324],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Pallavi Choudhury","user_id":33184,"display_name":"Pallavi Choudhury","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pallavic\/?lang=zh-cn\" aria-label=\"\u8bbf\u95ee\u4e2a\u4eba\u8d44\u6599\u9875\u9762\u4e86\u89e3Pallavi Choudhury\">Pallavi Choudhury<\/a>","is_active":false,"last_first":"Choudhury, Pallavi","people_section":0,"alias":"pallavic"},{"type":"user_nicename","value":"Lukas Sch&auml;fer","user_id":43602,"display_name":"Lukas Sch&auml;fer","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/t-luschaefer\/?lang=zh-cn\" aria-label=\"\u8bbf\u95ee\u4e2a\u4eba\u8d44\u6599\u9875\u9762\u4e86\u89e3Lukas Sch&auml;fer\">Lukas Sch&auml;fer<\/a>","is_active":false,"last_first":"Sch\u00e4fer, Lukas","people_section":0,"alias":"t-luschaefer"},{"type":"user_nicename","value":"Chris Lovett","user_id":36027,"display_name":"Chris Lovett","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/clovett\/?lang=zh-cn\" aria-label=\"\u8bbf\u95ee\u4e2a\u4eba\u8d44\u6599\u9875\u9762\u4e86\u89e3Chris Lovett\">Chris Lovett<\/a>","is_active":false,"last_first":"Lovett, Chris","people_section":0,"alias":"clovett"},{"type":"user_nicename","value":"Katja Hofmann","user_id":32468,"display_name":"Katja Hofmann","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kahofman\/?lang=zh-cn\" aria-label=\"\u8bbf\u95ee\u4e2a\u4eba\u8d44\u6599\u9875\u9762\u4e86\u89e3Katja Hofmann\">Katja Hofmann<\/a>","is_active":false,"last_first":"Hofmann, Katja","people_section":0,"alias":"kahofman"},{"type":"user_nicename","value":"Sergio Valcarcel Macua","user_id":42507,"display_name":"Sergio Valcarcel Macua","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sergiov\/?lang=zh-cn\" aria-label=\"\u8bbf\u95ee\u4e2a\u4eba\u8d44\u6599\u9875\u9762\u4e86\u89e3Sergio Valcarcel Macua\">Sergio Valcarcel Macua<\/a>","is_active":false,"last_first":"Valcarcel Macua, Sergio","people_section":0,"alias":"sergiov"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-960x540.jpg\" class=\"img-object-cover\" alt=\"Smart Replay - flowchart diagram showing the flow between Encoder, State Predictor, and Policy\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/SmartReplay-BlogHeroFeature-1400x788_New.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"February 5, 2026","formattedExcerpt":"This research looks at why Predictive Inverse Dynamics Models often outperform standard Behavior Cloning in imitation learning. By using simple predictions of what happens next, PIDMs reduce ambiguity and learn from far fewer demonstrations.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160901","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43868"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1160901"}],"version-history":[{"count":68,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160901\/revisions"}],"predecessor-version":[{"id":1161296,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160901\/revisions\/1161296"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1161128"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1160901"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1160901"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1160901"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1160901"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1160901"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1160901"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1160901"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1160901"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1160901"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1160901"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1160901"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}