{"id":635340,"date":"2020-02-13T13:12:18","date_gmt":"2020-02-10T17:00:37","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=635340"},"modified":"2020-05-14T11:21:31","modified_gmt":"2020-05-14T18:21:31","slug":"zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters\/","title":{"rendered":"ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-635928 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-3.gif\" alt=\"chart DeepSpeed\" width=\"1400\" height=\"788\" \/>The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/DeepSpeed\">releasing an open-source library called DeepSpeed<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models. DeepSpeed is compatible with <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/pytorch.org\/\">PyTorch<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. One piece of that library, called ZeRO, is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. Researchers have used these breakthroughs to create Turing Natural Language Generation (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\">Turing-NLG<\/a>), the largest publicly known language model at 17 billion parameters, which you can learn more about <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\">in this accompanying blog post.<\/a><\/p>\n<p>The Zero Redundancy Optimizer (abbreviated ZeRO) is a novel memory optimization technology for large-scale distributed deep learning. ZeRO can train deep learning models with 100 billion parameters on the current generation of GPU clusters at three to five times the throughput of the current best system. It also presents a clear path to training models with trillions of parameters, demonstrating an unprecedented leap in deep learning system technology. We are releasing ZeRO as part of DeepSpeed, our high-performance library for accelerating distributed deep learning training.<\/p>\n<h3>Challenges of training large deep learning models<\/h3>\n<p>Large models offer significant accuracy gains, but training billions to trillions of parameters frequently runs up against fundamental hardware limitations. To fit these models into memory, existing solutions make trade-offs between computation, communication, and development efficiency:<\/p>\n<p style=\"padding-left: 40px;\">\u2022 Data parallelism does not help reduce memory footprint per device: a model with more than 1 billion parameters runs out of memory even on GPUs with 32GB of memory.<\/p>\n<p style=\"padding-left: 40px;\">\u2022 Model parallelism does not scale efficiently beyond a single node due to fine-grained computation and expensive communication. Model parallelism frameworks frequently require extensive code integration that may be model architecture specific.\u00a0 For example, the NVIDIA<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/nv-adlr.github.io\/MegatronLM\"> Megatron-LM<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> set a new model size record of 8.3 billion parameters.\u00a0 It scales very well for such a model that fits in multiple GPUs of a single node, but when scaling across nodes, its performance degrades. For example, we observe about five teraflops\/GPU when running 40 billion parameters across NVIDIA DGX-2 nodes.<\/p>\n<h3>Overcoming limitations of data parallelism and model parallelism with ZeRO<\/h3>\n<p>We developed ZeRO to conquer the limitations of data parallelism and model parallelism while achieving the merits of both. ZeRO removes the memory redundancies across data-parallel processes by partitioning the model states\u2014parameters, gradients, and optimizer state\u2014across data parallel processes instead of replicating them. It uses a dynamic communication schedule during training to share the necessary state across distributed devices to retain the computational granularity and communication volume of data parallelism.<\/p>\n<p>We call this <em>ZeRO-powered data parallelism<\/em>, which allows per-device memory usage to scale linearly with the degree of data parallelism and incurs similar communication volume as data parallelism. ZeRO-powered data parallelism can fit models of arbitrary size\u2014as long as the aggregated device memory is large enough to share the model states.<\/p>\n<h3>The three stages of ZeRO and its benefits<\/h3>\n<p>ZeRO has three main optimization stages (as depicted in Figure 1), which correspond to the partitioning of optimizer states, gradients, and parameters. When enabled cumulatively:<\/p>\n<p>1. Optimizer State Partitioning (P<sub>os<\/sub>) \u2013 4x memory reduction, same communication volume as data parallelism<\/p>\n<p>2. Add Gradient Partitioning (P<sub>os+g<\/sub>) \u2013 8x memory reduction, same communication volume as data parallelism<\/p>\n<p>3. Add Parameter Partitioning (P<sub>os+g+p<\/sub>) \u2013 Memory reduction is linear with data parallelism degree N<sub>d<\/sub>. For example, splitting across 64 GPUs (N<sub>d<\/sub> = 64) will yield a 64x memory reduction. There is a modest 50% increase in communication volume.<\/p>\n<p>ZeRO eliminates memory redundancies and makes the full aggregate memory capacity of a cluster available. With all three stages enabled, ZeRO can train a trillion-parameter model on just 1024 NVIDIA GPUs. A trillion-parameter model with an optimizer like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/1412.6980.pdf\">Adam<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in 16-bit precision requires approximately 16 terabytes (TB) of memory to hold the optimizer states, gradients, and parameters. 16TB divided by 1024 is 16GB, which is well within a reasonable bound for a GPU.<\/p>\n<div id=\"attachment_635352\" style=\"width: 961px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-635352\" class=\"wp-image-635352 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-1.png\" alt=\"chart\" width=\"951\" height=\"392\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-1.png 951w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-1-300x124.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-1-768x317.png 768w\" sizes=\"auto, (max-width: 951px) 100vw, 951px\" \/><p id=\"caption-attachment-635352\" class=\"wp-caption-text\">Figure 1: Memory savings and communication volume for the three stages of ZeRO compared with standard data parallel baseline. In the memory consumption formula, \u03a8 refers to the number of parameters in a model and <i>K<\/i> is the optimizer specific constant term. As a specific example, we show the memory consumption for a 7.5B parameter model using <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/1412.6980.pdf\">Adam<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> optimizer where <i>K<\/i>=12 on 64 GPUs. We also show the communication volume of ZeRO relative to the baseline.<\/p><\/div>\n<p>The video below shows how ZeRO (with all three stages) performs a training step including forward pass, backward pass, and parameter update.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"width: 2160px;\" class=\"wp-video\"><video class=\"wp-video-shortcode\" id=\"video-635340-1\" width=\"2160\" height=\"900\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/Turing-Animation.mp4?_=1\" \/><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/Turing-Animation.mp4\">https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/Turing-Animation.mp4<\/a><\/video><\/div>\n<h3><\/h3>\n<p>&nbsp;<\/p>\n<h3>DeepSpeed: PyTorch compatibility and system performance<\/h3>\n<p>We implemented ZeRO stage one \u2014 optimizer states partitioning (ZeRO-OS in short) \u2014 which has a demonstrated capability to support 100-billion-parameter models. The code is being released together with our training optimization library, DeepSpeed. DeepSpeed brings state-of-the-art training techniques, such as ZeRO, distributed training, mixed precision, and checkpointing, through lightweight APIs compatible with <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/pytorch.org\/\">PyTorch<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. With just a few lines of code changes to your PyTorch model, you can leverage DeepSpeed to address the underlying performance challenges and boost the speed and scale of your training.<\/p>\n<p>DeepSpeed excels in four aspects (as visualized in Figure 2):<\/p>\n<p style=\"padding-left: 40px;\">\u2022 <strong>Scale<\/strong>: State-of-the-art large models such as OpenAI GPT-2, NVIDIA Megatron-LM, and Google T5 have sizes of 1.5 billion, 8.3 billion, and 11 billion parameters respectively. ZeRO stage one in DeepSpeed provides system support to run models up to 100 billion parameters, 10 times bigger. In the future, we plan to add support for ZeRO stages two and three, unlocking the ability to train models with 200 billion parameters to trillions of parameters.<\/p>\n<p style=\"padding-left: 40px;\">\u2022 <strong>Speed<\/strong>: We observe up to five times higher throughput over state of the art across various hardware. For example, to train large models on GPT family of workloads, DeepSpeed combines ZeRO-powered data parallelism with NVIDIA Megatron-LM model parallelism.\u00a0 On NVIDIA GPU clusters with low-bandwidth interconnect (without NVIDIA NVLink or Infiniband), we achieve a 3.75x throughput improvement over using Megatron-LM alone for a standard GPT-2 model with 1.5 billion parameters. On NVIDIA DGX-2 clusters with high-bandwidth interconnect, for models of 20 to 80 billion parameters, we are three to five times faster. These throughput improvements come from DeepSpeed\u2019s higher memory efficiency and ability to fit these models using a lower degree of model parallelism and larger batch sizes.<\/p>\n<p style=\"padding-left: 40px;\">\u2022 <strong>Cost<\/strong>: Improved throughput can be translated to significantly reduced training cost. For example, to train a model with 20 billion parameters, DeepSpeed requires three times fewer resources.<\/p>\n<p style=\"padding-left: 40px;\">\u2022 <strong>Usability<\/strong>: Only a few lines of code changes are needed to enable a PyTorch model to use DeepSpeed and ZeRO. Compared to current model parallelism libraries, DeepSpeed does not require a code redesign or model refactoring. It also does not put limitations on model dimensions (such as number of attention heads, hidden sizes, and others), batch size, or any other training parameters. For models of up to six billion parameters, you can use data parallelism (powered by ZeRO) conveniently without requiring model parallelism, while in contrast, standard data parallelism will run out of memory for models with more than 1.3 billion parameters. ZeRO stages two and three will further increase the model size trainable with data parallelism alone. In addition, DeepSpeed supports flexible combination of ZeRO-powered data parallelism with model parallelism.<\/p>\n<div id=\"attachment_635355\" style=\"width: 985px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-635355\" class=\"wp-image-635355 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-2.png\" alt=\"chart\" width=\"975\" height=\"450\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-2.png 975w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-2-300x138.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/DeepSpeed-Image-2-768x354.png 768w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><p id=\"caption-attachment-635355\" class=\"wp-caption-text\">Figure 2: DeepSpeed excels in scale, speed, cost and usability. The bottom left figure depicts system throughput improvements of DeepSpeed (combining ZeRO-powered data parallelism with model parallelism of Megatron-LM) over using Megatron-LM alone. The bottom right figure compares trainable model size using data parallelism alone with and without ZeRO.<\/p><\/div>\n<h3>Turing-NLG and DeepSpeed-powered large model training<\/h3>\n<div>\n<p>We leveraged ZeRO-OS in DeepSpeed to train a 17-billion-parameter Turing-NLG model with higher accuracy and higher training efficiency than current state-of-the-art approaches. Please refer to this <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\">blog<\/a>, which shows the new accuracy records the model establishes and its wide applications on free-form text generation, summarization, and answer synthesis.<\/p>\n<p><span lang=\"EN\">ZeRO-OS is complementary and compatible with different types of model parallelism, and for large models that do not fit into a single node (around 20 billion parameters or more), it offers significant performance gains, resource savings, and flexibility in model design compared to using model parallelism alone. <\/span><\/p>\n<p>We use ZeRO-OS in combination with Megatron-LM from NVIDIA in DeepSpeed to train the Turing-NLG model. The memory savings from ZeRO-OS allows the Turning-NLG model to be run with 4x smaller model parallelism degree and 4x larger batch size compared to using NVIDIA Megatron-LM alone. As a result we achieve 3x throughput gain. Additionally, we can train at batch size of 512 with only 256 GPUs compared to 1024 GPUs needed with Megatron-LM alone. Finally, Megatron-LM cannot run this exact model\u2014the model structure is not supported because its attention head (=28) is not divisible by the model parallelism degree (=16). DeepSpeed takes the model from infeasible to run to feasible and efficient to train!<\/p>\n<p>For more details, please see the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/DeepSpeed\">DeepSpeed GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> repository and the<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/zero-memory-optimization-towards-training-a-trillion-parameter-models\/\"> ZeRO paper<\/a>. We are also working with the ONNX and ONNX Runtime communities for further integration of these techniques.<\/p>\n<p><strong>About the DeepSpeed Team:<\/strong> We are a group of system researchers and engineers\u2014Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Arash Ashari, Elton Zheng, Jing Zhao, Minjia Zhang, Niranjan Uma Naresh, Reza Yazdani Aminabadi, Shaden Smith, Yuxiong He (team lead)\u2014who are enthusiastic about performance optimization of large-scale systems. We have recently focused on deep learning systems, optimizing its speed to train, speed to convergence, and speed to develop!<\/p>\n<p>If this type of work interests you, the DeepSpeed team is hiring! Please visit our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/careers.microsoft.com\/us\/en\/search-results?keywords=optimizing%20deep%20learning%20libraries\">careers page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":635931,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":0,"footnotes":""},"categories":[194467],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-635340","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[678390,649749],"related-events":[],"related-researchers":[{"type":"guest","value":"deepspeed-team","user_id":"690909","display_name":"DeepSpeed Team","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/deepspeed\/#!people\" aria-label=\"Visit the profile page for DeepSpeed Team\">DeepSpeed Team<\/a>","is_active":true,"last_first":"Team, DeepSpeed","people_section":0,"alias":"deepspeed-team"},{"type":"guest","value":"rangan-majumder","user_id":"635934","display_name":"Rangan  Majumder","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ranganm\/\" aria-label=\"Visit the profile page for Rangan  Majumder\">Rangan  Majumder<\/a>","is_active":true,"last_first":"Majumder, Rangan ","people_section":0,"alias":"rangan-majumder"},{"type":"guest","value":"junhua-wang","user_id":"635838","display_name":"Junhua  Wang","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/junhuaw\/\" aria-label=\"Visit the profile page for Junhua  Wang\">Junhua  Wang<\/a>","is_active":true,"last_first":"Wang, Junhua ","people_section":0,"alias":"junhua-wang"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-1024x577.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-768x433.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/02\/MSResearch_20200207_DeepZeroBlogGraphic_r2t3_1400x788-1.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/deepspeed\/#!people\" title=\"Go to researcher profile for DeepSpeed Team\" aria-label=\"Go to researcher profile for DeepSpeed Team\" data-bi-type=\"byline author\" data-bi-cN=\"DeepSpeed Team\">DeepSpeed Team<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ranganm\/\" title=\"Go to researcher profile for Rangan  Majumder\" aria-label=\"Go to researcher profile for Rangan  Majumder\" data-bi-type=\"byline author\" data-bi-cN=\"Rangan  Majumder\">Rangan  Majumder<\/a>, and <a href=\"https:\/\/www.linkedin.com\/in\/junhuaw\/\" title=\"Go to researcher profile for Junhua  Wang\" aria-label=\"Go to researcher profile for Junhua  Wang\" data-bi-type=\"byline author\" data-bi-cN=\"Junhua  Wang\">Junhua  Wang<\/a>","formattedDate":"February 13, 2020","formattedExcerpt":"The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/635340","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=635340"}],"version-history":[{"count":74,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/635340\/revisions"}],"predecessor-version":[{"id":690912,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/635340\/revisions\/690912"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/635931"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=635340"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=635340"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=635340"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=635340"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=635340"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=635340"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=635340"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=635340"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=635340"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=635340"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=635340"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}