In the news | siliconANGLE

Microsoft AI tool enables ‘extremely large’ models with a trillion parameters

September 11, 2020

Microsoft Corp. has released a new version of its open-source DeepSpeed tool that it says will enable the creation of deep learning models with a trillion parameters, more than five times as many as in the world’s current largest model.

Microsoft Research Blog

DeepSpeed: Extreme-scale model training for everyone

September 10, 2020 | DeepSpeed Team, Rangan Majumder, and Junhua Wang

In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has…

In the news | VentureBeat

Microsoft’s updated DeepSpeed can train trillion-parameter AI models with fewer GPUs

September 10, 2020

Microsoft today released an updated version of its DeepSpeed library that introduces a new approach to training AI models containing trillions of parameters, the variables internal to the model that inform its predictions. The company claims the technique, dubbed 3D…

In the news | DeepSpeed.ai

Microsoft DeepSpeed achieves the fastest BERT training time

May 27, 2020

Good news! DeepSpeed obtains the fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPU. This is a 30% improvement over the best published result of 67 mins in end-to-end training time to achieve the same accuracy on the…

simulated aerial drone navigating course

Microsoft Research Blog

Research Collection: Tools and Data to Advance the State of the Art

May 19, 2020

“This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing” —Sam Madden, Professor, Massachusetts Institute of Technology An open…

Microsoft Research Blog

ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale

May 19, 2020 | DeepSpeed Team, Rangan Majumder, and Junhua Wang

Microsoft Research Blog

Turing-NLG: A 17-billion-parameter language model by Microsoft

February 13, 2020 | Corby Rosset

This figure was adapted from a similar image published in DistilBERT. Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a…

Microsoft Research Blog

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

February 13, 2020 | DeepSpeed Team, Rangan Majumder, and Junhua Wang

The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly…

In the news | WinBuzzer

Microsoft DeepSpeed with Zero Can Train 100 Billion Parameter AI Models

February 11, 2020

Microsoft has released a new open-source library called DeepSpeed, which, when combined with its ‘ZeRO’ module can train 100 billion parameter models without using the resources traditionally associated with that.