The need

Creating AI that augments our own capabilities requires training large models that can deeply understand our world. However, training models of this magnitude is restricted by existing machine learning techniques, compute infrastructure, and development tools.

The idea

We believe that to create next-generation AI, our distinct AI efforts to address existing model development limitations must be centralized. From here, we can scale these AI capabilities across our products and platforms.

The solution

Our AI at Scale initiative enables hardware and software infrastructure advancements that, in turn, allow development of large, pre-trained AI models that are integrated across our products. These tools and techniques are open-sourced and available through our Azure.

Technical details for AI at Scale

Our ability to develop large-scale models like Turing is due to our platform and software advancements. In terms of platform infrastructure, Azure AI uses a hyper-cluster GPUs to become a high-performance computing environment where models can be trained.

We developed and open-sourced software that makes it easier and faster to train models with trillions of parameters. DeepSpeed is Pytorch library that brings together deep learning optimization and parallelization techniques. Zero Redundancy Optimizer (ZeRO), a novel parallelized optimizer within DeepSpeed, uses a dynamic communication schedule to partition model states across devices during training. Due to this, ZeRO can train 100-billion parameter deep learning models on existing GPU clusters with three to five times more throughout than the current system.

We have also open-sourced ONNX (Open Neural Network Exchange) Runtime, a cross-platform machine learning training and inference engine that focuses on performance and scalability. It delivers up to 17 times faster interfacing and up to 1.4 times faster training. We bring together DeepSpeed and other capabilities into ONNX Runtime. We are making it easier to train and deploy models by incorporating DeepSpeed and ONNX into Azure ML.

Our pre-trained language models are available to companies and institutions. To learn more about the options for AI at Scale across your organization, please sign up.

