DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

10x Larger Models | 10x Faster Training | Minimal Code Change

DeepSpeed can train DL models with over a hundred billion parameters on current generation of GPU clusters, while achieving over 10x in system performance compared to the state-of-art. Early adopters of DeepSpeed have already produced a language model (LM) with over 17B parameters called Turing-NLG, establishing a new SOTA in the LM category.

DeepSpeed is an important part of Microsoft’s new AI at Scale initiative to enable next-generation AI capabilities at scale. Take a deep dive into large scale AI across Microsoft.