DeepSpeed

DeepSpeed

Extreme Speed and Scale for DL Training and Inference

  1. 2023
    Feb

    DeepSpeed supports automatic tensor parallelism for HuggingFace models

    Previously, a user needed to provide an injection policy to DeepSpeed to enable tensor parallelism. DeepSpeed now supports automatic tensor parallelism for HuggingFace models by default as long as kernel injection is not enabled and an injection policy is not provided. This allows our users to improve performance of models that are not currently supported via kernel injection, without providing the injection policy. See our tutorial of the new automatic tensor parallelism feature for inference.

  2. 2022
    Dec

    DeepSpeed Data Efficiency Library: Towards Less Data, Faster Training, and Higher Model Quality

    DeepSpeed releases a new Data Efficiency Library to reduce training data and cost with boosted model quality via new innovations on data sampling and data routing with composable and customizable library support. The library greatly reduces training cost while maintaining model quality (1.5-2x less data and time for GPT-3/BERT pretraining), or further improves model quality under same training cost (>1 point gain for GPT-3-1.3B zero/few-shot evaluation).

  3. 2022
    Nov

    Achieve sub-second Stable Diffusion Image Generation with DeepSpeed-MII

    Stable Diffusion (opens in new tab) is a latent text-to-image diffusion model, which is capable of creating stunning art within seconds. In this tutorial you will learn how to deploy and run Stable Diffusion with state-of-the-art performance optimizations from DeepSpeed-Inference (opens in new tab) and DeepSpeed-MII (opens in new tab) and achieve image generation under one second.

  4. 2022
    Oct

    DeepSpeed Model Implementations for Inference (MII)

    DeepSpeed-MII, a new open-source Python library from DeepSpeed, is now available to make low-latency, low-cost inference of powerful deep learning models not only feasible but also easily accessible to everyone. MII /ɛm-aɪ-tu/ offers highly optimized implementations of thousands of widely used deep learning models, e.g., MII can speed up Stable Diffusion by 1.9x, Big Science Bloom 176B model by 5.7x with 40x cost reduction.

  5. 2022
    Sep

    ZeRO-Inference: Democratizing massive model inference

    ZeRO-Inference (opens in new tab) comes from the family of ZeRO technologies, which are a collection of powerful memory and parallelism optimizations for efficient large scale model training and inference on modern GPU clusters. ZeRO-Inference enables inference computation of massive models (with hundreds of billions of parameters) on as few as a single GPU, thereby making massive model inference accessible to almost everyone. Moreover, by dramatically reducing GPU memory requirements with CPU or NVMe memory which are significantly cheaper, it significantly reduces the cost of massive model inference, offering an affordable inference path to SOTA models.

  6. 2022
    Jul

    DeepSpeed helped train 176 Billion parameter BLOOM model

    The 176B BLOOM model has been trained using Megatron-DeepSpeed, which is a combination of 2 main technologies: DeepSpeed and Metatron-LM. DeepSpeed developed a 3D parallelism based implementation by combining ZeRO sharding and pipeline parallelism from the DeepSpeed library with Tensor Parallelism from Megatron-LM.

  7. DeepSpeed Compression: A composable library for extreme compression

    DeepSpeed releases a new pillar, DeepSpeed Compression (opens in new tab), to tackle latency and cost challenges on deploying large-scale deep learning models. It offers novel compression algorithms, supports synergistic composition of state-of-the-art compression methods, boosts deployments by making inference speed faster, model size smaller, while dramatically shortening the compression cost. We demonstrated 32x smaller model size, 5.2x better efficiency, and 5000x lower compression cost with this release.

  8. Azure and DeepSpeed empower easy-to-use and high-performance model training

    Azure ML, Azure HPC, and DeepSpeed collaborated and made large-scale distributed training easier and more efficient on Azure using DeepSpeed technology. We developed and released simple-to-use training pipelines for both Azure ML and Azure HPC. Azure and DeepSpeed combined offer excellent performance and scalability. We have scaled model sizes to 2 trillion parameters, scaled various workloads to 1024 A100-80GB GPUs, and obtained up to 1.8x higher throughput compared to the latest results published on other cloud providers.

  9. 2022
    Mar

    DeepSpeed support for efficient large model training on AMD GPUs

    We are excited to announce that DeepSpeed’s suite of training optimizations for efficient large model training is now available on ROCm-enabled AMD GPUs. This means that powerful parallelism and memory optimizations such as ZeRO, ZeRO-Offload, ZeRO-Infinity and 3D parallelism can used while training with AMD GPUs.

  10. 2021
    Oct

    DeepSpeed trained the world’s most powerful language model: Megatron-Turing NLG 530B

    We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA to further parallelize and optimize the training of very large AI models.