RL Theory - Microsoft Research

1.0Microsoft Researchhttps://www.microsoft.com/en-us/researchYue Wanghttps://www.microsoft.com/en-us/research/people/yuwang5/RL Theory - Microsoft Researchrich600338<blockquote class="wp-embedded-content" data-secret="obrMYRLMpO"><a href="https://www.microsoft.com/en-us/research/project/rl-theory/">RL Theory</a></blockquote><iframe sandbox="allow-scripts" security="restricted" src="https://www.microsoft.com/en-us/research/project/rl-theory/embed/#?secret=obrMYRLMpO" width="600" height="338" title="“RL Theory” — Microsoft Research" data-secret="obrMYRLMpO" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"></iframe><script> /*! This file is auto-generated */ !function(d,l){"use strict";l.querySelector&&d.addEventListener&&"undefined"!=typeof URL&&(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&&!/[^a-zA-Z0-9]/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret="'+t.secret+'"]'),o=l.querySelectorAll('blockquote[data-secret="'+t.secret+'"]'),c=new RegExp("^https?:$","i"),i=0;i<o.length;i++)o[i].style.display="none";for(i=0;i<a.length;i++)s=a[i],e.source===s.contentWindow&&(s.removeAttribute("style"),"height"===t.message?(1e3<(r=parseInt(t.value,10))?r=1e3:~~r<200&&(r=200),s.height=r):"link"===t.message&&(r=new URL(s.getAttribute("src")),n=new URL(t.value),c.test(n.protocol))&&n.host===r.host&&l.activeElement===s&&(d.top.location.href=t.value))}},d.addEventListener("message",d.wp.receiveEmbedMessage,!1),l.addEventListener("DOMContentLoaded",function(){for(var e,t,s=l.querySelectorAll("iframe.wp-embedded-content"),r=0;r<s.length;r++)(t=(e=s[r]).getAttribute("data-secret"))||(t=Math.random().toString(36).substring(2,12),e.src+="#?secret="+t,e.setAttribute("data-secret",t)),e.contentWindow.postMessage({message:"ready",secret:t},"*")},!1)))}(window,document); //# sourceURL=https://www.microsoft.com/en-us/research/wp-includes/js/wp-embed.min.js </script> Finding the optimal policy of a given controlled problem is a fundamental task in artificial intelligence. The reinforcement problem faces the following challenges: (1) No pre-given I.I.D. data. (2) No direct supervised label as in supervised learning. (3) Hard to do the optimization. To propose efficient algorithms to solve the control problem, we need the theoretical […]