<?xml version="1.0"?>
<oembed><version>1.0</version><provider_name>Microsoft Research</provider_name><provider_url>https://www.microsoft.com/en-us/research</provider_url><author_name>Yue Wang</author_name><author_url>https://www.microsoft.com/en-us/research/people/yuwang5/</author_url><title>RL Theory - Microsoft Research</title><type>rich</type><width>600</width><height>338</height><html>&lt;blockquote class="wp-embedded-content" data-secret="Ofxwx7LiYd"&gt;&lt;a href="https://www.microsoft.com/en-us/research/project/rl-theory/"&gt;RL Theory&lt;/a&gt;&lt;/blockquote&gt;&lt;iframe sandbox="allow-scripts" security="restricted" src="https://www.microsoft.com/en-us/research/project/rl-theory/embed/#?secret=Ofxwx7LiYd" width="600" height="338" title="&#x201C;RL Theory&#x201D; &#x2014; Microsoft Research" data-secret="Ofxwx7LiYd" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"&gt;&lt;/iframe&gt;&lt;script type="text/javascript"&gt;
/* &lt;![CDATA[ */
/*! This file is auto-generated */
!function(d,l){"use strict";l.querySelector&amp;&amp;d.addEventListener&amp;&amp;"undefined"!=typeof URL&amp;&amp;(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&amp;&amp;!/[^a-zA-Z0-9]/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret="'+t.secret+'"]'),o=l.querySelectorAll('blockquote[data-secret="'+t.secret+'"]'),c=new RegExp("^https?:$","i"),i=0;i&lt;o.length;i++)o[i].style.display="none";for(i=0;i&lt;a.length;i++)s=a[i],e.source===s.contentWindow&amp;&amp;(s.removeAttribute("style"),"height"===t.message?(1e3&lt;(r=parseInt(t.value,10))?r=1e3:~~r&lt;200&amp;&amp;(r=200),s.height=r):"link"===t.message&amp;&amp;(r=new URL(s.getAttribute("src")),n=new URL(t.value),c.test(n.protocol))&amp;&amp;n.host===r.host&amp;&amp;l.activeElement===s&amp;&amp;(d.top.location.href=t.value))}},d.addEventListener("message",d.wp.receiveEmbedMessage,!1),l.addEventListener("DOMContentLoaded",function(){for(var e,t,s=l.querySelectorAll("iframe.wp-embedded-content"),r=0;r&lt;s.length;r++)(t=(e=s[r]).getAttribute("data-secret"))||(t=Math.random().toString(36).substring(2,12),e.src+="#?secret="+t,e.setAttribute("data-secret",t)),e.contentWindow.postMessage({message:"ready",secret:t},"*")},!1)))}(window,document);
//# sourceURL=https://www.microsoft.com/en-us/research/wp-includes/js/wp-embed.min.js
/* ]]&gt; */
&lt;/script&gt;
</html><description>Finding the optimal policy of a given controlled problem is a fundamental task in artificial intelligence.&nbsp;The reinforcement problem faces the following challenges: (1) No pre-given I.I.D. data. (2) No direct supervised label as in supervised learning. (3) Hard to do the optimization. To propose efficient algorithms to solve the control problem, we need the theoretical [&hellip;]</description></oembed>
