{"id":305843,"date":"2016-10-14T23:05:56","date_gmt":"2016-10-15T06:05:56","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=305843"},"modified":"2018-10-16T22:05:00","modified_gmt":"2018-10-17T05:05:00","slug":"end-end-lstm-based-dialog-control-optimized-supervised-reinforcement-learning","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/end-end-lstm-based-dialog-control-optimized-supervised-reinforcement-learning\/","title":{"rendered":"End-To-End LSTM-Based Dialog Control Optimized With Supervised And Reinforcement Learning"},"content":{"rendered":"\n\n\n<p class=\"wp-block-paragraph\">This paper presents a model for end-to-end learning of task-oriented dialog systems. The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions. The LSTM automatically infers a representation of dialog history, which relieves the system developer of much of the manual feature engineering of dialog state. In addition, the developer can provide software that expresses business rules and provides access to programmatic APIs, enabling the LSTM to take actions in the real world on behalf of the user. The LSTM can be optimized using supervised learning (SL), where a domain expert provides example dialogs which the LSTM should imitate; or using reinforcement learning (RL), where the system improves by interacting directly with end users. Experiments show that SL and RL are complementary: SL alone can derive a reasonable initial policy from a small number of training dialogs; and starting RL optimization with a policy trained with SL substantially accelerates the learning rate of RL.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This paper presents a model for end-to-end learning of task-oriented dialog systems. The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions. The LSTM automatically infers a representation of dialog history, which relieves the system developer of much of [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"jawillia","user_id":"32190"},{"type":"user_nicename","value":"gzweig","user_id":"31938"}],"msr_publishername":"arxiv","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"MSR-TR-2016-72","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"1606.01269","msr_mag_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_release_tracker_id":"","msr_highlight_type":"","msr_date_display_format":"","msr_main_download_label":"","msr_external_link_label":"","msr_doi_label":"","msr_published_date":"2016-06-03","msr_startdate":"","msr_presentation_date":"","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"https:\/\/arxiv.org\/abs\/1606.01269","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_year":2016,"msr_month":6,"msr_day":3,"msr_microsoftintellectualproperty":true,"msr_pub_id":"","msr_publication_uploader":[{"type":"file","title":"williams2016lstm","label_id":243132,"id":305846,"viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/williams2016lstm.pdf"},{"type":"url","title":"https:\/\/arxiv.org\/abs\/1606.01269","label_id":252679,"id":false,"viewUrl":false}],"msr_related_uploader":[],"msr_original_fields_of_study":[],"msr_s2_paper_id":"","msr_s2_pdf_url":"","msr_citation_count_updated":"","msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[{"provider":"arxiv","id":"1606.01269"}],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13545],"msr-publication-type":[193718],"msr-publisher":[],"msr-publication-cta":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-305843","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"arxiv","msr_edition":"","msr_affiliation":"","msr_published_date":"2016-06-03","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2016-72","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"https:\/\/arxiv.org\/abs\/1606.01269","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"williams2016lstm","label_id":243132,"id":305846,"viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/williams2016lstm.pdf"},{"type":"url","title":"https:\/\/arxiv.org\/abs\/1606.01269","label_id":252679,"id":false,"viewUrl":false}],"msr_related_uploader":[],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"1606.01269","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"https:\/\/arxiv.org\/abs\/1606.01269"}],"msr-author-ordering":[{"type":"user_nicename","value":"jawillia","user_id":32190,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=jawillia"},{"type":"user_nicename","value":"gzweig","user_id":31938,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=gzweig"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[390593,395930],"msr_project":[393245,377990,295931,171313],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"techreport","related_content":{"projects":[{"ID":393245,"post_title":"Conversational Intelligence","post_name":"conversational-intelligence","post_type":"msr-project","post_date":"2017-07-05 10:01:45","post_modified":"2017-11-15 13:39:25","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/conversational-intelligence\/","post_excerpt":"Intelligent agents that can handle human language play a growing role in personalized, ubiquitous computing and the everyday use of devices. Agents need to be able to communicate and collaborate with humans in ways that are seamless and natural, and to be able to learn new behaviors, concepts, and relationships as first-class operations. In other words, our devices need to be able to converse with us. In this project, Microsoft Research AI teams are interested&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/393245"}]}},{"ID":377990,"post_title":"Deep Reinforcement Learning for Goal-Oriented Dialogues","post_name":"deep-reinforcement-learning-goal-oriented-dialogue","post_type":"msr-project","post_date":"2017-04-18 11:51:36","post_modified":"2019-08-19 10:03:33","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/deep-reinforcement-learning-goal-oriented-dialogue\/","post_excerpt":"Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems, at SLT 2018. [Proposal] All the data, source code and schedule information will be updated here. This project aims to develop intelligent dialogue agents to help users effectively accomplish tasks via natural language conversation. A typical goal-oriented dialogue system contains three major components: natural language understanding (NLU), natural language generation (NLG), and dialogue management (DM) that consists of state tracking and policy learning. Our research focus is&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990"}]}},{"ID":295931,"post_title":"Chatbots and\u00a0Conversation As A Platform (CAAP)","post_name":"chatbots-conversation-platform-caap","post_type":"msr-project","post_date":"2016-09-21 23:16:41","post_modified":"2017-06-05 12:48:54","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/chatbots-conversation-platform-caap\/","post_excerpt":"At\u00a0Microsoft Build 2016 event, Microsoft CEO Satya Nadella said\u00a0that chatbots, as next big thing, will have\u00a0\u201cas profound an impact as previous shifts we\u2019ve had.\u201d\u00a0The past paradigm shifts include graphical user interface, the web browser and the touchscreen. Conversations As\u00a0A platform(CAAP) has\u00a0the promise of making booking a flight or buying a new shirt as easy as sending a text message,\u00a0with the potential to make computing more\u00a0accessible to users\u00a0on mobile devices. This group has been worked on&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/295931"}]}},{"ID":171313,"post_title":"Dialog and Conversational Systems Research","post_name":"dialog-and-conversational-systems-research","post_type":"msr-project","post_date":"2014-03-14 09:46:35","post_modified":"2017-07-11 15:34:26","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/dialog-and-conversational-systems-research\/","post_excerpt":"Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety of settings, such as conversational systems in apps or devices, and situated interactions in the real world. Projects Spoken Language Understanding","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171313"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/305843","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/305843\/revisions"}],"predecessor-version":[{"id":400346,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/305843\/revisions\/400346"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=305843"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=305843"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=305843"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=305843"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=305843"},{"taxonomy":"msr-publication-cta","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-cta?post=305843"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=305843"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=305843"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=305843"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=305843"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=305843"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=305843"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=305843"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=305843"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}