{"id":444522,"date":"2017-12-06T12:45:40","date_gmt":"2017-12-06T20:45:40","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=444522"},"modified":"2017-12-07T09:19:15","modified_gmt":"2017-12-07T17:19:15","slug":"deliberation-networks","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/deliberation-networks\/","title":{"rendered":"Deliberation Network: Pushing the frontiers of neural machine translation"},"content":{"rendered":"<p>During the Tang dynasty of China, which lasted from 618 to 907, the poet Jia Dao was known for polishing his poems over and over to make them better and better. One famous story describes how he deliberated over two lines of a poem that read, \u201cBirds nestle in the trees by the pond. A monk pushes the door in the moonlight.<em>\u201d<\/em> Dao concentrated on the word \u201cpushes.\u201d He considered using \u201cknocks\u201d instead. After a long period of deliberation, he chose \u201cknocks\u201d because it provides contrast to the tranquil atmosphere of the suburban night. Such a careful re-thinking process led to a beautiful poem that continues to spread:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-447903 aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Capture.jpg\" alt=\"\" width=\"810\" height=\"499\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Capture.jpg 1214w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Capture-300x185.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Capture-768x473.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Capture-1024x631.jpg 1024w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/p>\n<p>The\u00a0story about Dao\u2019s deliberation over the words \u201cpushes\u201d and \u201cknocks\u201d demonstrates an important human cognitive process: To compose great literary works, the first draft is typically a base to build on and polish until it yields the final version. The same is true in the process of writing academic papers; first drafts are usually written by students and then revised or re-written by their supervisors. In these cases, the first draft is typically unsatisfactory but provides a global textual skeleton for retouching and revision. The polishing process, which we call deliberation, improves the quality.<\/p>\n<p>In a paper presented at the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/nips.cc\/Conferences\/2017\/\">2017 Neural Information Processing Systems<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> conference in Long Beach, California, we describe our <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/deliberation-networks-sequence-generation-beyond-one-pass-decoding\/\">Deliberation Network<\/a> that we developed to enhance artificial intelligence, in particular natural language generation. Our network is inspired by the human process of deliberation.<\/p>\n<p style=\"text-align: left;\" align=\"center\">We start with the basic neural machine translation structure: the sequence-to-sequence model, shown in Figure 1. To translate a sentence in a source language, for example the Chinese sentence \u201c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u5373\u5c06\u8fce\u6765\u4e8c\u5341\u5468\u5e74\u8bde\u8fb0,\u201d we first use a neural network called an encoder to scan over it and then pass the output to another neural network called the decoder. Based on the encoder\u2019s output, the decoder will generate each translated word in the target language, for example, in this case, \u201cMicrosoft Research is about to have twenty birthday,\u201d which is a rough translation.<\/p>\n<div id=\"attachment_444615\" style=\"width: 786px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-444615\" class=\"wp-image-444615\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-1.png\" alt=\"\" width=\"776\" height=\"190\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-1.png 1275w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-1-300x73.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-1-768x188.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-1-1024x251.png 1024w\" sizes=\"auto, (max-width: 776px) 100vw, 776px\" \/><p id=\"caption-attachment-444615\" class=\"wp-caption-text\"><em>Figure 1: Basic sequence-to-sequence structure to perform translation.<\/em><\/p><\/div>\n<p>As Figure 1 shows, there is only one generation process in this framework. That is, the generation result via the decoder is set as the final translation. We refer to such a decoder as the first pass decoder. To add the deliberation process, we added another decoder, the second pass decoder, to the network structure, which is illustrated in Figure 2.<\/p>\n<div id=\"attachment_444618\" style=\"width: 786px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-444618\" class=\"wp-image-444618\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-2.png\" alt=\"\" width=\"776\" height=\"255\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-2.png 1354w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-2-300x99.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-2-768x252.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/11\/Figure-2-1024x337.png 1024w\" sizes=\"auto, (max-width: 776px) 100vw, 776px\" \/><p id=\"caption-attachment-444618\" class=\"wp-caption-text\"><em>Figure 2: The structure of Deliberation Network.<\/em><\/p><\/div>\n<p>The second pass decoder expands the translation process to two steps: the first pass decoder first reads the encoder\u2019s encoding result for the source sentence <em>x<\/em> and outputs a rough, or first-draft, translation sentence, denoted as <em>y&#8217;<\/em>. Then, the second pass decoder takes both <em>x<\/em> and <em>y&#8217;<\/em> as inputs to generate the final translation <em>y<\/em>. The second step is the deliberation process: the <em>y&#8217;<\/em> output via the first pass decoder acts as the draft, which is revised into <em>y<\/em> via automatic learning using a deep neural network. An example is shown in Figure 2: the original translation, \u201cMicrosoft Research is about to have twenty birthday\u201d is revised to: \u201cMicrosoft Research Asia will celebrate twentieth anniversary of birth.\u201d<\/p>\n<p>We verify the effectiveness of our Deliberation Network on a large benchmark dataset, the English-French translation campaign from the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.statmt.org\/wmt14\/\">Ninth Workshop on Statistical Machine Translation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in 2014, which contains about 36 million training data and 3 thousand test data. The results are shown in Table 1. The baseline systems include several of the most powerful neural machine translation \u00a0systems in the current literature such as We conclude that, together with our previous dual learning technique to effectively leverage monolingual data, we achieve the best single model performance (41.50) on this task using Deliberation Network, on top of a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/dl.acm.org\/citation.cfm?id=1246450\">simple stack LSTM architecture<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<table class=\"aligncenter\" style=\"height: 172px; border-collapse: collapse; border-spacing: inherit;\" border=\"1\" width=\"618\">\n<caption><em>Table 1: Comparison between Deliberation Network and different deep NMT systems on WMT&#8217;14 En->Fr.<\/em><\/caption>\n<tbody>\n<tr>\n<td style=\"padding: inherit; border: 1px solid;\"><strong>System<\/strong><\/td>\n<td style=\"padding: inherit; border: 1px solid;\"><strong>Configurations<\/strong><\/td>\n<td style=\"padding: inherit; border: 1px solid;\"><strong>BLEU<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"padding: inherit; border: 1px solid;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1609.08144\">GNMT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/td>\n<td style=\"padding: inherit; border: 1px solid;\">8-8 stacked LSTM encoder and decoder + RL finetune<\/td>\n<td style=\"padding: inherit; border: 1px solid;\">39.92<\/td>\n<\/tr>\n<tr>\n<td style=\"padding: inherit; border: 1px solid;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1705.03122\">FairS2S<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/td>\n<td style=\"padding: inherit; border: 1px solid;\">15-15 convolutional encoder and decoder<\/td>\n<td style=\"padding: inherit; border: 1px solid;\">40.51<\/td>\n<\/tr>\n<tr>\n<td style=\"padding: inherit; border: 1px solid;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1706.03762v4\">Transformer<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/td>\n<td style=\"padding: inherit; border: 1px solid;\">6-6 self-attention encoder and decoder<\/td>\n<td style=\"padding: inherit; border: 1px solid;\">41.0<\/td>\n<\/tr>\n<tr>\n<td style=\"padding: inherit; border: 1px solid;\" rowspan=\"3\">This work<\/td>\n<td style=\"padding: inherit; border: 1px solid;\">4-4 stacked LSTM encoder and decoder<\/td>\n<td style=\"padding: inherit; border: 1px solid;\">39.51<\/td>\n<\/tr>\n<tr>\n<td style=\"padding: inherit; border: 1px solid;\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dual-learning-machine-translation\/\">+Dual learning<\/a><\/td>\n<td style=\"padding: inherit; border: 1px solid;\">40.53<\/td>\n<\/tr>\n<tr>\n<td style=\"padding: inherit; border: 1px solid;\">+Dual learning + Deliberation Network<\/td>\n<td style=\"padding: inherit; border: 1px solid;\">41.5<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>Related<\/strong>:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/microsoft-research-nips-2017\/\">Microsoft Research at NIPS 2017<\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/deliberation-networks-sequence-generation-beyond-one-pass-decoding\/\">Deliberation Networks: Sequence Generation Beyond One-Pass Decoding<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>During the Tang dynasty of China, which lasted from 618 to 907, the poet Jia Dao was known for polishing his poems over and over to make them better and better. One famous story describes how he deliberated over two lines of a poem that read, \u201cBirds nestle in the trees by the pond. A [&hellip;]<\/p>\n","protected":false},"author":36509,"featured_media":447696,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"fetia","user_id":"36039"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194467],"tags":[],"research-area":[13556,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-444522","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[425610],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"655\" height=\"280\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Li-Ning-Secluded-Residence-2-655x280.jpg\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Li-Ning-Secluded-Residence-2-655x280.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/12\/Li-Ning-Secluded-Residence-2-655x280-300x128.jpg 300w\" sizes=\"auto, (max-width: 655px) 100vw, 655px\" \/>","byline":"Fei Tian","formattedDate":"December 6, 2017","formattedExcerpt":"During the Tang dynasty of China, which lasted from 618 to 907, the poet Jia Dao was known for polishing his poems over and over to make them better and better. One famous story describes how he deliberated over two lines of a poem that&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/444522","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/36509"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=444522"}],"version-history":[{"count":36,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/444522\/revisions"}],"predecessor-version":[{"id":448623,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/444522\/revisions\/448623"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/447696"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=444522"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=444522"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=444522"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=444522"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=444522"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=444522"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=444522"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=444522"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=444522"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=444522"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=444522"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}