{"id":680043,"date":"2020-08-04T09:33:34","date_gmt":"2020-08-04T16:33:34","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=680043"},"modified":"2020-10-06T15:59:25","modified_gmt":"2020-10-06T22:59:25","slug":"icml-2020-highlights-a-transformer-based-rl-agent-causal-ml-for-increased-privacy-and-more","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/icml-2020-highlights-a-transformer-based-rl-agent-causal-ml-for-increased-privacy-and-more\/","title":{"rendered":"ICML 2020 highlights: A Transformer-based RL agent, causal ML for increased privacy, and more"},"content":{"rendered":"\n<figure class=\"wp-block-image alignwide size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"577\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1024x577.png\" alt=\"\" class=\"wp-image-682260\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1024x577.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1536x865.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero.png 1643w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>With over 50 papers from Microsoft accepted at this year\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/icml.cc\/Conferences\/2020\">International Conference on Machine Learning (ICML 2020)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a number of which were presented in virtual workshops, Microsoft researchers are in full summer swing when it comes to advancing machine learning in accessibility, privacy, healthcare, and other areas. As Microsoft Partner Research Manager and ICML President <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcl\/\">John Langford<\/a> puts it, \u201cICML is a very broad conference, so its specialty is in some sense \u2018all of the above.\u2019\u201d But Langford goes on to add that one of the topics that ICML has a long track record on is currently trending: reinforcement learning. A brief glance through the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/icml-2020\/#!sessions\">sessions <\/a>and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/icml-2020\/#!workshops\">workshops <\/a>presented by Microsoft researchers shows the wide influence reinforcement learning has in our world today, from natural language to robotics to infrastructure considerations like transportation.<\/p>\n\n\n\n<p>Beyond the research contributions, Microsoft was also a sponsor of and recruiter at the conference. Additionally, the company sponsored two events co-located with the conference, the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/wimlworkshop.org\/icml2020\/\">first Women in Machine Learning Un-Workshop<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/sites.google.com\/view\/queer-in-ai\/icml-2020\">the fourth Queer in AI Workshop<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The impact of the conference\u2014now and in the future\u2014is multifaceted, according to Langford. \u201cICML is \u2018the\u2019 summer machine learning conference. As such, it\u2019s critically important to the academic discovery, review, and dissemination process, a great way to meet fellow researchers, and a natural recruiting point for the field,\u201d he says.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1002645\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: AI-POWERED EXPERIENCE<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" aria-label=\"Microsoft research copilot experience\" data-bi-cN=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/01\/MSR-Chat-Promo.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft research copilot experience<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-copilot-experience\" class=\"large\">Discover more about research at Microsoft through our AI-powered experience<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" aria-describedby=\"microsoft-research-copilot-experience\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t\t\tStart now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<p>Below&nbsp;are&nbsp;five&nbsp;selections&nbsp;of research presented by&nbsp;Microsoft.&nbsp;These projects highlight how broadly&nbsp;researchers are thinking about&nbsp;ML&nbsp;and its implications for society.&nbsp;But this diverse group of papers&nbsp;represents&nbsp;only a small slice of the&nbsp;advancements presented by&nbsp;Microsoft&nbsp;researchers.&nbsp;Explore the&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/icml-2020\/#!accepted-papers\" target=\"_blank\">Microsoft at&nbsp;ICML&nbsp;2020&nbsp;accepted papers list<\/a>&nbsp;to learn about&nbsp;further&nbsp;research contributions.&nbsp;<\/p>\n\n\n<p>See sections on: <u>ICML 2020 Overview<\/u> | <a href=\"#AIModels\">How AI models reason <\/a> | <a href=\"#Utility\">Utility and privacy with causal machine learning<\/a> | <a href=\"#Transformers\">Using Transformers to create RL agents<\/a> | <a href=\"#Pretraining\">Pretraining for bidirectional language models<\/a> | <a href=\"#Normalization\">Identifying layer normalization location<\/a><\/p>\n\n\n<figure class=\"wp-block-image alignwide size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"363\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/07\/Neural-network-3_-shorter.png\" alt=\"\" class=\"wp-image-681039\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/07\/Neural-network-3_-shorter.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/07\/Neural-network-3_-shorter-300x106.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/07\/Neural-network-3_-shorter-768x272.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div id=\"AIModels\" style=\"height: 30px;\"><\/div>\n\n<h3 class=\"wp-block-heading\">Understanding how AI models reason about what they&nbsp;see&nbsp;<\/h3>\n\n\n\n<p><strong>Bottom line:<\/strong>&nbsp;\u201cWe propose a principled approach to isolate, analyze, and interpret how visual&nbsp;question-answering&nbsp;models reason about what they see.\u201d&nbsp;<br>     \u2014Machine Learning Scientist&nbsp;<a href=\"https:\/\/www.microsoft.com\/applied-sciences\/people\/saeed-amizadeh\">Saeed&nbsp;Amizadeh<\/a><\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication <\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/neuro-symbolic-visual-reasoning-disentangling-visual-from-reasoning\/\" data-bi-cN=\"Neuro-Symbolic Visual Reasoning: Disentangling \u201cVisual\u201d from \u201cReasoning\u201d\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Neuro-Symbolic Visual Reasoning: Disentangling \u201cVisual\u201d from \u201cReasoning\u201d<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p><strong>Quick glance:<\/strong>&nbsp;In&nbsp;<a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/neuro-symbolic-visual-reasoning-disentangling-visual-from-reasoning\/\" target=\"_blank\" rel=\"noreferrer noopener\">\u201cNeuro-Symbolic Visual Reasoning: Disentangling \u2018Visual\u2019 from&nbsp;\u2018Reasoning,\u2019\u201d<\/a>&nbsp;researchers from&nbsp;the&nbsp;Microsoft Applied Sciences Lab&nbsp;and&nbsp;MSR AI&nbsp;collaborated to combine&nbsp;visual understanding&nbsp;and&nbsp;neuro-symbolic&nbsp;reasoning&nbsp;with&nbsp;natural language processing and program synthesis.&nbsp;\u201cWe develop a&nbsp;novel way to perform differentiable logical inference over visual scenes, which allows us to disentangle the processes of reasoning and perception in visual question answering (VQA) models,\u201d explains&nbsp;Amizadeh.&nbsp;The work also led to creating a methodology for evaluating state-of-the-art VQA models, and the researchers propose expanding beyond pure probabilistic logical reasoning to incorporate other contextual signals and improve visual perception of the models. <\/p>\n\n\n\n<p><strong>Areas of impact:&nbsp;<\/strong>This research lies at the intersection of natural language and visual perception, which makes it a good candidate for systems using AI for accessibility.&nbsp;Key is&nbsp;the work\u2019s focus&nbsp;on&nbsp;interpretability.&nbsp;The&nbsp;user should understand throughout the whole process how the neural reasoning is&nbsp;connecting&nbsp;what it \u201csees\u201d&nbsp;to&nbsp;language, building trust and reliability in AI.&nbsp;<\/p>\n\n\n\n<p><strong>Fun Fact:<\/strong>&nbsp;This project initially began as a project&nbsp;out of&nbsp;the&nbsp;Microsoft AI Residency Program.&nbsp;<\/p>\n\n\n\n<p><strong>The research team:  <\/strong><a href=\"https:\/\/www.microsoft.com\/applied-sciences\/people\/saeed-amizadeh\">Saeed&nbsp;Amizadeh<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/hpalangi\/\">Hamid Palangi<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/polozov\/\">Alex Polozov<\/a>, Yichen Huang, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kazukoi\/\">Kazuhito Koishida<\/a><\/p>\n\n\n\n<p><strong>Additional Resources:&nbsp;<\/strong>&nbsp;<br>&nbsp;<br><a href=\"https:\/\/www.microsoft.com\/applied-sciences\/\">Applied Sciences homepage&nbsp;<\/a><br>&nbsp;<br><a rel=\"noreferrer noopener\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-ai\/\" target=\"_blank\">MSR AI homepage<\/a>&nbsp;&nbsp;<br>&nbsp;<br><a rel=\"noreferrer noopener\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/academic-program\/microsoft-ai-residency-program\/\" target=\"_blank\">Microsoft AI Residency Program<\/a><\/p>\n\n\n\n<div id=\"Utility\" style=\"height: 30px;\"><\/div>\n\n<h3 class=\"wp-block-heading\">Improving utility and privacy&nbsp;with causal machine&nbsp;learning<\/h3>\n\n\n\n<p><strong>Bottom line:&nbsp;<\/strong>\u201c<em>What if you can build&nbsp;machine learning&nbsp;models&nbsp;that are both accurate and preserve privacy of individuals?<\/em> Try causal predictive models:&nbsp;We&nbsp;show that they are more robust to privacy attacks like membership inference and have higher accuracy&nbsp;on new domains&nbsp;than typical ML models.\u201d&nbsp;<br>     \u2014Microsoft&nbsp;Senior Researchers&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/amshar\/\">Amit Sharma&nbsp;<\/a>and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shtople\/\">Shruti&nbsp;Tople<\/a><\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication <\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/alleviating-privacy-attacks-via-causal-learning\/\" data-bi-cN=\"Alleviating Privacy Attacks via Causal Learning\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Alleviating Privacy Attacks via Causal Learning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p><strong>Quick glance:&nbsp;<\/strong>Privacy is paramount for&nbsp;institutions&nbsp;like hospitals&nbsp;and&nbsp;governments,&nbsp;which&nbsp;handle sensitive datasets and use ML models. Standard&nbsp;ML&nbsp;privacy approaches&nbsp;add noise to a model or data to protect information, but this can have the undesired effect of reducing accuracy or utility of the model.&nbsp;This work shows that causal&nbsp;learning,&nbsp;by which&nbsp;ML models are trained based on domain knowledge about causal relationships between features and outcomes, can increase both privacy and utility when compared to associational ML models&nbsp;with the same&nbsp;amount of noise.&nbsp;Researchers from&nbsp;Microsoft Research India<strong>&nbsp;<\/strong>provided knowledge of causal ML for this project, while researchers from&nbsp;Microsoft Research Cambridge&nbsp;brought expertise on privacy and security. Their paper is called <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/alleviating-privacy-attacks-via-causal-learning\/\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">\u201cAlleviating Privacy Attacks via Causal Learning.&#8221;<\/a><\/p>\n\n\n\n<p><strong>Areas of impact:&nbsp;<\/strong>This work aims to improve privacy protections for institutions&nbsp;using&nbsp;sensitive data with causal ML. In addition, this direction allows for improved model sharing across institutions and allows individuals to voluntarily share their own data without risk of information being leaked by an ML model.&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>New tools:&nbsp;<\/strong>The researchers have released an <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/robustdg\">open-source toolkit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, RobustDG,&nbsp;for evaluating causal ML models on privacy, robustness, and out-of-distribution accuracy.<\/p>\n\n\n\n<p><strong>The research team: <\/strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/amshar\/\" target=\"_blank\" rel=\"noreferrer noopener\">Amit Sharma<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shtople\/\" target=\"_blank\" rel=\"noreferrer noopener\">Shruti&nbsp;Tople<\/a>, and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/adityan\/\" target=\"_blank\" rel=\"noreferrer noopener\">Aditya Nori<\/a><\/p>\n\n\n\n<p><strong>Additional Resources:<\/strong>&nbsp;<br>&nbsp;<br><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/github.com\/microsoft\/robustdg\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository including open-source toolkit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br>&nbsp;<br><a rel=\"noreferrer noopener\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-india\/\" target=\"_blank\">Microsoft Research India homepage&nbsp;<br><br><\/a><a rel=\"noreferrer noopener\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-cambridge\/\" target=\"_blank\">Microsoft Research Cambridge homepage<\/a>&nbsp;&nbsp;<\/p>\n\n\n\n<div id=\"Transformers\" style=\"height: 30px;\"><\/div>\n\n<h3 class=\"wp-block-heading\">Using Transformers to create&nbsp;RL&nbsp;agents&nbsp;suited&nbsp;for real-world&nbsp;tasks<\/h3>\n\n\n\n<p><strong>Bottom line:&nbsp;<\/strong>\u201cTransformers for RL!\u201d&nbsp;<br>     \u2014Senior Research Software Engineer&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/riloynd\/\" target=\"_blank\">Ricky Loynd<\/a><\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/working-memory-graphs\/\" data-bi-cN=\"Working Memory Graphs\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Working Memory Graphs<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p><strong>Quick glance:&nbsp;<\/strong>\u201c<a rel=\"noreferrer noopener\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/working-memory-graphs\/\" target=\"_blank\">Working Memory Graphs<\/a>\u201d&nbsp;presents a new reinforcement learning agent \u201cthat accelerates learning on challenging tasks by leveraging the power of Transformers in three ways,\u201d&nbsp;explains&nbsp;Loynd.&nbsp;These three approaches apply Transformer attention to past observations, recurrent state vectors, and factored observations,&nbsp;respectively. \u201cBy leveraging the power of Transformers in these ways, our Working Memory Graph (WMG) agent accelerates learning on several challenging tasks:\u202fBabyAI, Pathfinding, and Sokoban. In&nbsp;BabyAI, WMG achieves drastic improvements in sample efficiency when observations are factored into more succinct representations,\u201d says Loynd.&nbsp;The team includes&nbsp;members from the reinforcement learning and&nbsp;deep learning&nbsp;groups&nbsp;within&nbsp;MSR AI.<\/p>\n\n\n\n<p><strong>Areas of impact:&nbsp;<\/strong>This work shows that WMG is effective in handling&nbsp;the&nbsp;structured, factored&nbsp;observations used in today\u2019s real-world applications of RL and accelerates RL so that AI agents will eventually be able to accomplish&nbsp;previously unattainable&nbsp;real-world tasks.<\/p>\n\n\n\n<p><strong>Performance and novel features:&nbsp;<\/strong>WMG outperforms a&nbsp;GRU&nbsp;(Gated Recurrent Unit)&nbsp;baseline agent at complex reasoning over past observations, and WMG has a new form of \u201cshortcut recurrence\u201d that proves to be more effective than standard gated recurrence. Sokoban results demonstrate that WMG performs better on this complex domain than the state-of-the-art&nbsp;Deep Repeated ConvLSTM (DRC) agent (by Google DeepMind)&nbsp;throughout 20 million steps of training.<\/p>\n\n\n\n<p><strong>The research team: <\/strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/riloynd\/\">Ricky Loynd<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rfernand\/\">Roland Fernandez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/aslicel\/\">Asli Celikyilmaz<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/adswamin\/\">Adith Swaminathan<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mahauskn\/\">Matthew Hausknecht<\/a>. <\/p>\n\n\n\n<p><strong>Additional Resources:<\/strong>&nbsp;<br>&nbsp;<br><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/github.com\/microsoft\/wmg_agent\" target=\"_blank\">Working Memory Graph GitHub repository&nbsp;<br><br><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">MSR AI homepage<\/a><\/p>\n\n\n\n<div id=\"Pretraining\" style=\"height: 30px;\"><\/div>\n\n<h3 class=\"wp-block-heading\">Efficient pretraining&nbsp;for&nbsp;bidirectional language models in one forward&nbsp;pass<\/h3>\n\n\n\n<p><strong>Bottom line:&nbsp;<\/strong>\u201cOur work efficiently realizes unified pretraining of bidirectional language models&nbsp;(via autoencoding) and sequence-to-sequence language models&nbsp;(via&nbsp;partially autoregressive) with&nbsp;a&nbsp;pseudo-masked&nbsp;language&nbsp;model for language understanding and&nbsp;generation.\u201d&nbsp;&nbsp;<br>     \u2014Senior Principal Research Manager <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/fuwei\/\">Furu&nbsp;Wei<\/a><\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/unilmv2-pseudo-masked-language-models-for-unified-language-model-pre-training\/\" data-bi-cN=\"UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p><strong>Quick glance:&nbsp;<\/strong>This research&nbsp;introduces pseudo-masked language models, allowing for efficient pretraining of bidirectional language models in natural language&nbsp;understanding and sequence-to-sequence language models in natural language generation in one forward pass.&nbsp;This work is a collaboration between Microsoft Research Asia, Microsoft Research Redmond, and both the DeepSpeed and Project Turing teams, who&nbsp;help&nbsp;scale up the&nbsp;pretraining to larger&nbsp;models and are working to implement those models&nbsp;in Microsoft products&nbsp;in an initiative called AI at Scale.&nbsp;The paper is&nbsp;titled&nbsp;\u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/unilmv2-pseudo-masked-language-models-for-unified-language-model-pre-training\/\" target=\"_blank\" rel=\"noreferrer noopener\">UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training<\/a>.\u201d<\/p>\n\n\n\n<p><strong>Areas of impact:&nbsp;<\/strong>This novel language model improves techniques for natural language generation, including document summarization and dialog generation. It also builds on techniques for&nbsp;natural language understanding, which includes text classification, question answering, and information extraction.&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>New state of the art:&nbsp;<\/strong>Results show this model achieves state of the art on various natural language generation and understanding tasks&nbsp;across&nbsp;numerous benchmarks.<\/p>\n\n\n\n<p><strong>The research&nbsp;team:&nbsp;<\/strong><\/p>\n\n\n\n<p><strong>MSR Asia:&nbsp;<\/strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/lidong1\/\">Li Dong<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/fuwei\/\">Furu&nbsp;Wei<\/a>,&nbsp;Wenhui&nbsp;Wang,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nanya\/\">Nan Yang<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mingzhou\/\">Ming Zhou<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/hon\/\">Hsiao-Wuen&nbsp;Hon<\/a>, and collaborators Hangbo&nbsp;Bao&nbsp;and&nbsp;Songhao&nbsp;Piao<\/p>\n\n\n\n<p><strong>MSR Redmond:&nbsp;<\/strong><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaodl\/\">Xiaodong&nbsp;Liu,<\/a> <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yuwwan\/\">Yu Wang<\/a>,&nbsp;and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao&nbsp;<\/a><\/p>\n\n\n\n<p><strong>Additional resources: <\/strong><\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/unilm\">GitHub repository of UniLM&nbsp;<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/ai-at-scale\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI at Scale&nbsp;homepage<\/a>&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-asia\/\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Research Asia homepage<\/a><strong><\/strong>&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-redmond\/\">Microsoft Research Redmond homepage&nbsp;<\/a><\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.deepspeed.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">DeepSpeed homepage<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;<\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/msturing.org\/\">Project Turing homepage<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<div id=\"Normalization\" style=\"height: 30px;\"><\/div>\n\n<h3 class=\"wp-block-heading\">Correctly identifying layer normalization location for better Transformer optimization<\/h3>\n\n\n\n<p><strong><strong>Bottom line:<\/strong><\/strong>&nbsp;\u201cUse Pre-LN Transformer to remove the annoying warm-up stage and save greatly on converge time.\u201d<strong><br><\/strong>     \u2014 Microsoft Researchers <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dihe\/#:~:text=Di%20He%20%28%E8%B4%BA%E7%AC%9B%29%20is%20currently%20a%20Senior%20Researcher,degrees%20from%20Peking%20University%2C%20advised%20by%20Liwei%20Wang.\">Di He<\/a> and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shuz\/\">Shuxin&nbsp;Zheng<\/a><\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--left\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/on-layer-normalization-in-the-transformer-architecture\/\" data-bi-cN=\"On Layer Normalization in the Transformer Architecture\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>On Layer Normalization in the Transformer Architecture<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p><strong>Quick glance:&nbsp;<\/strong>This research explores a&nbsp;known&nbsp;optimization issue with&nbsp;the original&nbsp;Transformer (BERT)&nbsp;that causes slowed&nbsp;down training, requiring hyperparameter tunings. The researchers offer theoretical proof the issue emerges from the location of layer normalization. They propose a&nbsp;variant of&nbsp;Pre-LN&nbsp;Transformer&nbsp;that correctly locates the layer normalization with easy optimization and the ability to quickly converge.&nbsp;This work was done by researchers affiliated with Microsoft Research Asia, the Chinese Academy of Sciences, and Peking University.&nbsp;The research is detailed in the paper&nbsp;\u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/on-layer-normalization-in-the-transformer-architecture\/\" target=\"_blank\" rel=\"noreferrer noopener\">On Layer Normalization in the Transformer Architecture<\/a>.\u201d&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>Areas of impact:&nbsp;<\/strong>There are many projects already using&nbsp;Pre-LN Transformer&nbsp;to train large-scale BERT models&nbsp;because of&nbsp;its exceptional optimization stability, including training on&nbsp;NVIDIA\u2019s&nbsp;Megatron, Open AI\u2019s&nbsp;GPT-2, and Open AI\u2019s&nbsp;GPT-3 models.<\/p>\n\n\n\n<p><strong>Added benefits:&nbsp;<\/strong>Because of&nbsp;the way this variant operates, it requires no additional hyperparameter tuning.&nbsp;This&nbsp;fact,&nbsp;combined with faster convergence,&nbsp;results in boosted energy efficiency.&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>The research&nbsp;team:<\/strong>&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dihe\/\" target=\"_blank\" rel=\"noreferrer noopener\">Di He<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shuz\/\" target=\"_blank\" rel=\"noreferrer noopener\">Shuxin&nbsp;Zheng<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/huzhang\/\" target=\"_blank\" rel=\"noreferrer noopener\">Huishuai&nbsp;Zhang<\/a>,&nbsp;and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/tyliu\/\" target=\"_blank\" rel=\"noopener noreferrer\">Tie-Yan Liu<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>Additional Resources:<\/strong><strong>&nbsp;<\/strong>&nbsp;<br>&nbsp;<br><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/ai-at-scale\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI at Scale homepage<\/a>&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-asia\/\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Research Asia homepage<\/a>&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/machine-translation-2\/\">Neural Machine Translation&nbsp;<\/a><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Explore more <\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/icml-2020\/#!accepted-papers\" data-bi-cN=\"ICML 2020 accepted papers \" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>ICML 2020 accepted papers <\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t\t\t<p class=\"annotations__caption text-neutral-400 mt-2\">Check out the complete list of accepted papers from MSR at ICML 2020 <\/p>\n\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p><br>&nbsp;<br>&nbsp;<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With over 50 papers from Microsoft accepted at this year\u2019s International Conference on Machine Learning (ICML 2020) (opens in new tab), a number of which were presented in virtual workshops, Microsoft researchers are in full summer swing when it comes to advancing machine learning in accessibility, privacy, healthcare, and other areas. As Microsoft Partner Research [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":682260,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13561,13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-680043","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560,199561],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[558228],"related-events":[670011],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-960x540.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1024x577.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1536x865.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/ICML-Hero.png 1643w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"August 4, 2020","formattedExcerpt":"With over 50 papers from Microsoft accepted at this year\u2019s International Conference on Machine Learning (ICML 2020) (opens in new tab), a number of which were presented in virtual workshops, Microsoft researchers are in full summer swing when it comes to advancing machine learning in&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/680043","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=680043"}],"version-history":[{"count":35,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/680043\/revisions"}],"predecessor-version":[{"id":696453,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/680043\/revisions\/696453"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/682260"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=680043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=680043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=680043"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=680043"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=680043"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=680043"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=680043"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=680043"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=680043"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=680043"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=680043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}