{"version":"1.0","provider_name":"Microsoft Research","provider_url":"https:\/\/www.microsoft.com\/en-us\/research","author_name":"Romain Laroche","author_url":"https:\/\/www.microsoft.com\/en-us\/research\/people\/rolaroch\/","title":"Training Dialogue Systems With Human Advice - Microsoft Research","type":"rich","width":600,"height":338,"html":"<blockquote class=\"wp-embedded-content\" data-secret=\"hA41WTFHsS\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/training-dialogue-systems-human-advice\/\">Training Dialogue Systems With Human Advice<\/a><\/blockquote><iframe sandbox=\"allow-scripts\" security=\"restricted\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/training-dialogue-systems-human-advice\/embed\/#?secret=hA41WTFHsS\" width=\"600\" height=\"338\" title=\"&#8220;Training Dialogue Systems With Human Advice&#8221; &#8212; Microsoft Research\" data-secret=\"hA41WTFHsS\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" class=\"wp-embedded-content\"><\/iframe><script type=\"text\/javascript\">\n\/* <![CDATA[ *\/\n\/*! This file is auto-generated *\/\n!function(d,l){\"use strict\";l.querySelector&&d.addEventListener&&\"undefined\"!=typeof URL&&(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&&!\/[^a-zA-Z0-9]\/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret=\"'+t.secret+'\"]'),o=l.querySelectorAll('blockquote[data-secret=\"'+t.secret+'\"]'),c=new RegExp(\"^https?:$\",\"i\"),i=0;i<o.length;i++)o[i].style.display=\"none\";for(i=0;i<a.length;i++)s=a[i],e.source===s.contentWindow&&(s.removeAttribute(\"style\"),\"height\"===t.message?(1e3<(r=parseInt(t.value,10))?r=1e3:~~r<200&&(r=200),s.height=r):\"link\"===t.message&&(r=new URL(s.getAttribute(\"src\")),n=new URL(t.value),c.test(n.protocol))&&n.host===r.host&&l.activeElement===s&&(d.top.location.href=t.value))}},d.addEventListener(\"message\",d.wp.receiveEmbedMessage,!1),l.addEventListener(\"DOMContentLoaded\",function(){for(var e,t,s=l.querySelectorAll(\"iframe.wp-embedded-content\"),r=0;r<s.length;r++)(t=(e=s[r]).getAttribute(\"data-secret\"))||(t=Math.random().toString(36).substring(2,12),e.src+=\"#?secret=\"+t,e.setAttribute(\"data-secret\",t)),e.contentWindow.postMessage({message:\"ready\",secret:t},\"*\")},!1)))}(window,document);\n\/\/# sourceURL=https:\/\/www.microsoft.com\/en-us\/research\/wp-includes\/js\/wp-embed.min.js\n\/* ]]> *\/\n<\/script>\n","description":"One major drawback of Reinforcement Learning (RL) Spoken Dialogue Systems is that they inherit from the general exploration requirements of RL which makes them hard to deploy from an industry perspective. On the other hand, industrial systems rely on human expertise and hand written rules so as to avoid irrelevant behavior to happen and maintain [&hellip;]"}