Microsoft Research Blog

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model 

October 11, 2021 | Ali Alvi and Paresh Kharya
We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA to…

Recent Posts

  1. Figure 1. Trend of sizes of state-of-the-art NLP models over time

    Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model 

    October 11, 2021 | Ali Alvi and Paresh Kharya

    We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA to…

  2. chart: CTDC global dataset on victims of trafficking

    Real-world evidence and the path from data to impact 

    September 23, 2021 | Darren Edge and Jonathan Larson

    From the intense shock of the COVID-19 pandemic to the effects of climate change, our global society has never faced greater risk. The Societal Resilience team at Microsoft Research was established in recognition of this risk and tasked with developing open technologies that enable a scalable response in times of crisis. And just as we think about scalability…

  3. Technical diagram of MEB model. MEB is a sparse neural network model composed of an input layer taking in binary features, a feature embedding layer transforming each binary feature into a 15-dimension vector, a sum pooling layer applied on each of 49 feature groups and concatenated to produce a 735-dimension vector, which is then passed through two dense layers to produce a click probability. Features shown in this figure are generated from the example query “Microsoft Windows” and document www.microsoft.com/en-us/windows.

    Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance 

    August 4, 2021 | Junyan Chen, Frédéric Dubut, Jason (Zengzhong) Li, and Rangan Majumder

    Recently, Transformer-based deep learning models like GPT-3 have been getting a lot of attention in the machine learning world. These models excel at understanding semantic relationships, and they have contributed to large improvements in Microsoft Bing’s search experience (opens in new tab) and surpassing human…

  4. While the NNGP and NTK limits essentially only considers the neural network initialization, the feature learning limit incorporates the entire training trajectory. A Neural network is represented by a stack of vertical shapes: an inverted trapezoid, a square, and a triangle. On the left side of the shape, A blue arrow moves upward and represents the first forward pass. The NNGP limit can be thought of as the limit of this first forward pass. On the right side of the shape, a green arrow moves downward and represents the first backward pass. The NTK limit can be thought of as the limit for this first backward pass. In contrast, the feature learning limit takes into account the many cycles of forward and backward passes that take place during the entire training process. These cycles are represented by many repetitions of blue upward arrow and green downward arrows to the right of the neural network. An orange box encloses all of these cycles. On top of the box is the annotation “SGD Training Progress” with an arrow to the right. An arrow comes out from the bottom of the box pointing to a textbox that says “Feature Learning Limit, This Work.”

    On infinitely wide neural networks that exhibit feature learning 

    July 22, 2021 | Edward Hu and Greg Yang

    In the pursuit of learning about fundamentals of the natural world, scientists have had success with coming at discoveries from both a bottom-up and top-down approach. Neuroscience is a great example of the former. Spanish anatomist Santiago Ramón y Cajal discovered the neuron in the…

  5. Micah Stampley, Lisa Nakamura posing for a photo

    Lecture series aims to help spur dialogue around race and technology 

    July 21, 2021

    In November, NYU media professor Charlton McIlwain (opens in new tab) joined fellow scholars Safiya Noble, Ruha Benjamin, and André Brock for a virtual discussion on anti-Blackness and technology hosted by the University of California Santa Barbara. The conversation was an engaging one, and McIlwain…

  6. Traditional cellular network infrastructure compared to cellular network infrastructure in the Microsoft cloud Two graphics. The first depicts traditional cellular network infrastructure, beginning with cell towers receiving data and transferring it to physical buildings--local hubs, then central exchanges and finally data centers. The second depicts cloudified cellular network infrastructure, with cell towers transmitting data to telco edges and Microsoft edges. This is also labelled "RAN in the cloud". The data then flows to the Microsoft cloud, including core network and OSS/BSS as a service.

    Project Arno: How Microsoft Research created the technology and industry momentum for Azure to empower telecom operators in the cloud 

    July 19, 2021 | Yongguang Zhang and Bozidar Radunovic

    Editor’s note: In recent years, telecommunications operators have faced a growing challenge to meet surging global demand for immersive online services and collaboration tools. Upgrading their proprietary networks to prepare for 5G and beyond would require major capital expenditures, even as competition was driving down…

Explore More

  • Events & conferences

    Events & conferences 

    Meet our community of researchers, learn about exciting research topics, and grow your network

  • Podcasts

    Podcasts 

    Ongoing conversations at the cutting edge of research

  • Microsoft Research Forum

    Microsoft Research Forum 

    Join us for a continuous exchange of ideas about research in the era of general AI