{"id":778711,"date":"2021-09-28T08:00:00","date_gmt":"2021-09-28T15:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=778711"},"modified":"2021-09-28T14:42:16","modified_gmt":"2021-09-28T21:42:16","slug":"microsoft-turing-universal-language-representation-model-t-ulrv5-tops-xtreme-leaderboard-and-trains-100x-faster","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-turing-universal-language-representation-model-t-ulrv5-tops-xtreme-leaderboard-and-trains-100x-faster\/","title":{"rendered":"Microsoft Turing Universal Language Representation model, T-ULRv5, tops XTREME leaderboard and trains 100x faster"},"content":{"rendered":"\n<p>Today, <strong>we are excited to announce that with our latest Turing universal language representation model (T-ULRv5), a Microsoft-created model is once again the state of the art and at the top of the Google<\/strong> <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/sites.research.google\/xtreme\"><strong>XTREME public leaderboard<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Resulting from a collaboration between the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/turing.microsoft.com\/\">Microsoft Turing team<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/\">Microsoft Research<\/a>, the 2.2 billion-parameter T-ULRv5 XL outperforms the current 2<sup>nd<\/sup> best model by an average score of 1.7 points. It is also the state of the art across each of the four subcategories of tasks on the leaderboard. These results demonstrate the strong capabilities of T-ULRv5, which, in addition to being more capable, <strong>trains 100 times faster than its predecessors<\/strong>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a href=\"https:\/\/sites.research.google\/xtreme\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"672\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig1-XTREME-leaderboard-snip-1024x672.png\" alt=\"Figure 1: XTREME leaderboard showing T-ULRv5 at the top.\" class=\"wp-image-778735\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig1-XTREME-leaderboard-snip-1024x672.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig1-XTREME-leaderboard-snip-300x197.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig1-XTREME-leaderboard-snip-768x504.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig1-XTREME-leaderboard-snip-240x158.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig1-XTREME-leaderboard-snip.png 1092w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption> Figure 1: XTREME leaderboard showing T-ULRv5 at the top. <\/figcaption><\/figure><\/div>\n\n\n\n<p>This marks a return to the top of this leaderboard for Microsoft. We have previously held the top position with Turing ULRv2 and other submissions. To reach this latest achievement, we scaled up our recent research on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/2106.16138.pdf\">XLM-E<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> into the 2.2 billion XL model and coupled it with breakthroughs across data, architecture, and optimization strategies to produce the final pretrained model. We also deployed our advanced fine-tuning technique, called <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/consistency-regularization-for-cross-lingual-fine-tuning\/\">XTune<\/a>.<\/p>\n\n\n\n<h2 id=\"xtreme-benchmark\">XTREME benchmark<\/h2>\n\n\n\n<p>The&nbsp;<strong>C<\/strong>ross-lingual&nbsp;<strong>TR<\/strong>ansfer&nbsp;<strong>E<\/strong>valuation of&nbsp;<strong>M<\/strong>ultilingual&nbsp;<strong>E<\/strong>ncoders (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/sites.research.google\/xtreme\">XTREME<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) benchmark covers 40 typologically diverse languages that span 12 language families and includes nine tasks that require reasoning about different levels of syntax or semantics. The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data.<\/p>\n\n\n\n<p>The tasks included in XTREME cover a range of paradigms, including sentence text classification, structured prediction, sentence retrieval, and cross-lingual question answering. Consequently, for models to be successful on the XTREME benchmarks, they must learn representations that generalize to many standard cross-lingual transfer settings.<\/p>\n\n\n\n<p>For a full description of the benchmark, languages, and tasks, please see the paper,&nbsp;\u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2003.11080\">XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u201d<\/p>\n\n\n\n<h2 id=\"t-ulrv5-an-advancement-across-multiple-ai-axes\">T-ULRv5: An advancement across multiple AI axes<\/h2>\n\n\n\n<p>T-ULRv5 is the latest addition to the family of Turing models, which represent a foundational part of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/innovation.microsoft.com\/en-us\/ai-at-scale\">Microsoft AI at Scale<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. This cross-lingual model incorporates our recent research on XLM-E and is capable of encoding text from 94 languages and representing them in a shared vector space. In the realm of big neural network models, the research frontier has many axes of exploration. The most common axis is the model size, where larger models tend to perform better than their smaller counterparts.<\/p>\n\n\n\n<p>However, increasing model size without innovations along other axes\u2014like better vocabulary, higher quality data, novel training tasks and objectives, innovative network architecture, and training optimizations\u2014usually results in highly inefficient use of expensive compute for a marginally better model. We have introduced and incorporated breakthrough innovations across <strong>all<\/strong> these axes to make T-ULRv5 a high-quality, highly efficient model.<\/p>\n\n\n\n<p>Besides its size, T-ULRv5 introduces some key differences and innovations that set it apart from other pretrained multilingual language models and lead to state-of-the-art model performance and greatly improve training efficiency.<\/p>\n\n\n\n<h3 id=\"model-architecture-pretraining-and-tasks\">Model architecture, pretraining, and tasks<\/h3>\n\n\n\n<p>&nbsp;T-ULRv5 shares a similar transformer architecture that&#8217;s popular among the emerging <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/2108.07258.pdf\">foundation models<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and multilingual models like mBERT, mT5, XLM-R, and the previous version, T-ULRv2. Specifically, T-ULRv5 XL, the largest variant we pretrained, has 48 transformer layers, a hidden dimension size of 1,536, 24 attention heads, a 500,000-token multilingual vocabulary size, and a total parameter count of 2.2 billion.<\/p>\n\n\n\n<p>The technology behind T-ULRv5, XLM-E, takes inspiration from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2003.10555v1\">ELECTRA<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and is a departure from the previously described <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/infoxlm-an-information-theoretic-framework-for-cross-lingual-language-model-pre-training\/\">InfoXLM<\/a>. It moves away from InfoXLM\u2019s MMLM (Multilingual Masked Language Modeling) and TLM (Translation Language Modeling) pretraining tasks and adopts two new tasks\u2014MRTD (Multilingual Replaced Token Detection) and TRTD (Translation Replaced Token Detection)\u2014with the goal of distinguishing real input tokens from corrupted tokens.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"graphical user interface, application\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig2-MRTD-chart.png\"><img loading=\"lazy\" decoding=\"async\" width=\"580\" height=\"406\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig2-MRTD-chart.png\" alt=\"graphical user interface, application\" class=\"wp-image-778738\" title=\"Multilingual Replaced Token Detection (MRTD)\u00a0pretraining\u00a0task\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig2-MRTD-chart.png 580w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig2-MRTD-chart-300x210.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig2-MRTD-chart-240x168.png 240w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><figcaption>Figure 2:&nbsp;Multilingual Replaced Token Detection (MRTD)&nbsp;pretraining&nbsp;task.&nbsp;A generator&nbsp;predicts&nbsp;masked tokens&nbsp;in the input&nbsp;and a discriminator predicts&nbsp;whether&nbsp;each token&nbsp;was&nbsp;replaced by&nbsp;a&nbsp;generator&nbsp;sample.<\/figcaption><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Figure 2: Multilingual Replaced Token Detection (MRTD) pretraining task. A generator predicts masked tokens in the input and a discriminator predicts whether each token was replaced by a generator sample.\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig3-TRTD.png\"><img loading=\"lazy\" decoding=\"async\" width=\"579\" height=\"386\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig3-TRTD.png\" alt=\"Figure 2: Multilingual Replaced Token Detection (MRTD) pretraining task. A generator predicts masked tokens in the input and a discriminator predicts whether each token was replaced by a generator sample.\" class=\"wp-image-778741\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig3-TRTD.png 579w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig3-TRTD-300x200.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig3-TRTD-240x160.png 240w\" sizes=\"auto, (max-width: 579px) 100vw, 579px\" \/><\/a><figcaption> Figure 3: The Translation Replaced Token Detection (TRTD) pretraining task. A generator predicts masked tokens on translation pairs in the input, and a discriminator predicts whether each token was replaced by a generator sample.<\/figcaption><\/figure><\/div>\n\n\n\n<p>Like ELECTRA, T-ULRv5 training involves two transformer encoders, serving as generator and discriminator, respectively. Unlike ELECTRA, which was trained on only English datasets, T-ULRv5 was trained on large-scale multilingual datasets, including parallel text corpora. We encourage the model to better learn cross-lingual alignment and shared representation by making the generator predict the masked tokens on translation pairs in addition to mono-lingual input. After the pretraining is complete, only the discriminator is used as the text encoder to fine-tune downstream tasks.<\/p>\n\n\n\n<h3 id=\"100x-better-training-efficiency\">100x better training efficiency<\/h3>\n\n\n\n<p>Existing approaches for cross-lingual pretraining based on Masked Language Modeling (MLM) usually require massive computation resources, rendering such models quite expensive. In contrast, XLM-E trains significantly faster and it outperforms the baseline models on various cross-lingual understanding tasks with much less computation cost. For example, with the same corpora, code base, and model size (12 layer), we compared XLM-E (indicated with a red line in Figure 4) with an in-house version of Facebook\u2019s popular multilingual <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/1911.02116.pdf\">XLM-R<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> model augmented with translation language modeling (XLM-R + TLM, indicated with a blue line in Figure 4).<\/p>\n\n\n\n<p><strong>We observed 130x training speedup from XLM-E to achieve the same XNLI accuracy<\/strong>, with the 12-layer base XLM-E model completing its training in only 1.7 days on 64 NVIDIA A100 GPUs. At 2.2 billion parameters, the top-performing T-ULRv5 XL model benefitted from the much-improved training efficiency of XLM-E and finished its training in less than two weeks on 256 NVIDIA A100 GPUs. The combination of introducing new TRTD tasks along with RTD tasks and changes in network architecture accelerated the convergence and quality of the model.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Figure 4: XLM-E matching XLM-R+TLM on XNLI accuracy\u2014130x faster.\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig4-XLM-E.png\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"462\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig4-XLM-E.png\" alt=\"Figure 4: XLM-E matching XLM-R+TLM on XNLI accuracy\u2014130x faster.\" class=\"wp-image-778744\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig4-XLM-E.png 624w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig4-XLM-E-300x222.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig4-XLM-E-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/T-ULRv5-fig4-XLM-E-240x178.png 240w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/a><figcaption>Figure 4: XLM-E matching XLM-R+TLM on XNLI accuracy\u2014130x faster.<\/figcaption><\/figure><\/div>\n\n\n\n<h3 id=\"multilingual-training-data\">Multilingual training data<\/h3>\n\n\n\n<p>Part of the T-ULRv5 quality improvement comes from better training data and a bigger vocabulary. Training a 2.2 billion parameter model supporting 94 languages requires datasets in greater quantity and with high quality. Multilingual corpora, many from the web, usually have a large representation disparity between high-resource and low-resource languages, particularly on data volume, cleanness, and diversity.<\/p>\n\n\n\n<p>In addition, parallel language corpora that consist of translated text pairs can also suffer from mixed translation quality and alignment issues, negatively affecting the resulting model performance. We put significant effort into data engineering and cleaning steps to produce high-quality datasets at scale to support T-ULRv5 training.<\/p>\n\n\n\n<h3 id=\"expanded-vocabulary\">Expanded vocabulary<\/h3>\n\n\n\n<p>Along with the dataset updates, we also constructed a new vocabulary with 500,000 tokens, two times larger than that of T-ULRv2, which further improved model performance on all languages. Increasing vocabulary size and retraining with a fair representation of all languages is not a trivial task. We describe our method and results on the vocabulary expansion work in this <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/allocating-large-vocabulary-capacity-for-cross-lingual-language-model-pre-training\/\">recent research paper<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">PUBLICATION<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/allocating-large-vocabulary-capacity-for-cross-lingual-language-model-pre-training\/\" data-bi-cN=\"Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">PUBLICATION<\/span>\n\t\t\t<a href=\"\" data-bi-cN=\"Consistency Regularization for Cross-Lingual Fine-Tuning\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Consistency Regularization for Cross-Lingual Fine-Tuning<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">PUBLICATION<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/xlm-e-cross-lingual-language-model-pre-training-via-electra\/\" data-bi-cN=\"XLM-E: Cross-lingual Language Model Pre-training via ELECTRA\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>XLM-E: Cross-lingual Language Model Pre-training via ELECTRA<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">PUBLICATION<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/infoxlm-an-information-theoretic-framework-for-cross-lingual-language-model-pre-training\/\" data-bi-cN=\"InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training\" data-external-link=\"false\" data-bi-aN=\"citation\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h2 id=\"t-ulrv5-release-information\">T-ULRv5: Release information<\/h2>\n\n\n\n<p>The Turing team strongly believes in bringing the best of our AI technology into the hands of Microsoft customers as soon as possible. T-ULRv5 will soon deliver benefits to many of our existing product scenarios across products such as Microsoft Bing, Microsoft 365, Microsoft Edge, Microsoft Azure, and more. At Microsoft, our mission is to empower every person and every organization on the planet to achieve more\u2014regardless of where they live and what languages they speak. Thanks to the universal capabilities of T-ULRv5, we hope to one day make these benefits available to our customers.<\/p>\n\n\n\n<p>Microsoft Turing models are also available for custom application building through our private preview program. If you are interested in learning more about this and other Turing models, please complete the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/forms.office.com\/Pages\/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR1UCeHaYbjdIhNGA6afHCs1UOEtWRUJSUDBOUlM2TkpCQ01SMUlLVzJNMS4u\">early access request form<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. We work closely with Azure Cognitive Services to power current and future language services with Turing models. Therefore, existing Azure Cognitive Services customers will start to see these benefits as they become available.<\/p>\n\n\n\n<p>If you are a researcher who would like to work with us in assessing and improving Turing models, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/collaboration\/microsoft-turing-academic-program\/#:~:text=We%20have%20created%20the%20Microsoft%20Turing%20Academic%20Program,researchers%20with%20a%20private%20preview%20of%20Turing%20models.\">Microsoft Turing Academic Program (MS-TAP)<\/a> allows you to submit a proposal and get access to these models in greater detail.<\/p>\n\n\n\n<h2 id=\"building-and-democratizing-more-inclusive-ai\">Building and democratizing more inclusive AI<\/h2>\n\n\n\n<p>We are exploring multilingual technology to help democratize AI by addressing barriers, such as the lack of training data, the high cost of language modeling, and the complexity of multilingual systems. T-ULRv5 is an important milestone in this endeavor, as its cross-lingual transferability and zero-shot application paradigm provide a much more efficient and scalable framework for developing cross-lingual systems.<\/p>\n\n\n\n<p>We are motivated by the opportunity to further advance the state of the art and develop new multilingual capabilities to build more inclusive AI. For example, we are excited about exploring neural machine translation (NMT) and language generation with a cross-lingual encoder, as you can read in this <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/2106.13736.pdf\">research paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. We hope that our work will contribute to the community&#8217;s progress towards making AI more inclusive and accessible to all.<\/p>\n\n\n\n<p>The Microsoft Turing team welcomes your feedback and comments and looks forward to sharing more developments in the future.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, we are excited to announce that with our latest Turing universal language representation model (T-ULRv5), a Microsoft-created model is once again the state of the art and at the top of the Google XTREME public leaderboard (opens in new tab). Resulting from a collaboration between the Microsoft Turing team (opens in new tab) and [&hellip;]<\/p>\n","protected":false},"author":35981,"featured_media":779380,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Saurabh Tiwary","user_id":"39603"},{"type":"user_nicename","value":"Lidong Zhou","user_id":"32673"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-778711","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[691494,649749],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Lidong Zhou","user_id":32673,"display_name":"Lidong Zhou","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/lidongz\/\" aria-label=\"Visit the profile page for Lidong Zhou\">Lidong Zhou<\/a>","is_active":false,"last_first":"Zhou, Lidong","people_section":0,"alias":"lidongz"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"XTREME leaderboard showing T-ULRv5 at the top.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/09\/XTREME-leaderboard_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"Saurabh Tiwary and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/lidongz\/\" title=\"Go to researcher profile for Lidong Zhou\" aria-label=\"Go to researcher profile for Lidong Zhou\" data-bi-type=\"byline author\" data-bi-cN=\"Lidong Zhou\">Lidong Zhou<\/a>","formattedDate":"September 28, 2021","formattedExcerpt":"Today, we are excited to announce that with our latest Turing universal language representation model (T-ULRv5), a Microsoft-created model is once again the state of the art and at the top of the Google XTREME public leaderboard (opens in new tab). Resulting from a collaboration&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/778711","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/35981"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=778711"}],"version-history":[{"count":19,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/778711\/revisions"}],"predecessor-version":[{"id":779602,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/778711\/revisions\/779602"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/779380"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=778711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=778711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=778711"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=778711"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=778711"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=778711"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=778711"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=778711"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=778711"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=778711"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=778711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}