{"version":"1.0","provider_name":"Microsoft Research","provider_url":"https:\/\/www.microsoft.com\/en-us\/research","author_name":"Andrey Kolobov","author_url":"https:\/\/www.microsoft.com\/en-us\/research\/people\/akolobov\/","title":"Heuristic-Guided Reinforcement Learning - Microsoft Research","type":"rich","width":600,"height":338,"html":"<blockquote class=\"wp-embedded-content\" data-secret=\"xJ7OxQbRjQ\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/heuristic-guided-reinforcement-learning\/\">Heuristic-Guided Reinforcement Learning<\/a><\/blockquote><iframe sandbox=\"allow-scripts\" security=\"restricted\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/heuristic-guided-reinforcement-learning\/embed\/#?secret=xJ7OxQbRjQ\" width=\"600\" height=\"338\" title=\"&#8220;Heuristic-Guided Reinforcement Learning&#8221; &#8212; Microsoft Research\" data-secret=\"xJ7OxQbRjQ\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" class=\"wp-embedded-content\"><\/iframe><script>\n\/*! This file is auto-generated *\/\n!function(d,l){\"use strict\";l.querySelector&&d.addEventListener&&\"undefined\"!=typeof URL&&(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&&!\/[^a-zA-Z0-9]\/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret=\"'+t.secret+'\"]'),o=l.querySelectorAll('blockquote[data-secret=\"'+t.secret+'\"]'),c=new RegExp(\"^https?:$\",\"i\"),i=0;i<o.length;i++)o[i].style.display=\"none\";for(i=0;i<a.length;i++)s=a[i],e.source===s.contentWindow&&(s.removeAttribute(\"style\"),\"height\"===t.message?(1e3<(r=parseInt(t.value,10))?r=1e3:~~r<200&&(r=200),s.height=r):\"link\"===t.message&&(r=new URL(s.getAttribute(\"src\")),n=new URL(t.value),c.test(n.protocol))&&n.host===r.host&&l.activeElement===s&&(d.top.location.href=t.value))}},d.addEventListener(\"message\",d.wp.receiveEmbedMessage,!1),l.addEventListener(\"DOMContentLoaded\",function(){for(var e,t,s=l.querySelectorAll(\"iframe.wp-embedded-content\"),r=0;r<s.length;r++)(t=(e=s[r]).getAttribute(\"data-secret\"))||(t=Math.random().toString(36).substring(2,12),e.src+=\"#?secret=\"+t,e.setAttribute(\"data-secret\",t)),e.contentWindow.postMessage({message:\"ready\",secret:t},\"*\")},!1)))}(window,document);\n\/\/# sourceURL=https:\/\/www.microsoft.com\/en-us\/research\/wp-includes\/js\/wp-embed.min.js\n<\/script>\n","description":"We provide a framework for accelerating reinforcement learning (RL) algorithms by heuristics constructed from domain knowledge or offline data. Tabula rasa RL algorithms require environment interactions or computation that scales with the horizon of the sequential decision-making task. Using our framework, we show how heuristic-guided RL induces a much shorter-horizon subproblem that provably solves the [&hellip;]"}