Models, Infrastructure and Hardware for Next-Generation AI Applications
AI innovation today is bound by limitations of compute infrastructure, effectiveness of machine learning models, and ease of development.
Microsoft’s AI at Scale initiative is pioneering a new approach that will result in next-generation AI capabilities that are scaled across the company’s products and AI platforms.
This includes developing a new class of large, centralized AI models that can be scaled and specialized across product domains, as well as creating state-of-the-art hardware and infrastructure to power this new class of models.
AI at Scale builds on years of systems work by Microsoft researchers, particularly in the area of parallel computation, that make it possible to more quickly train machine learning models at an unprecedented scale. For instance, Project Parasail, established in 2014, pioneered a novel approach to parallelizing a large class of seemingly sequential applications, particularly stochastic gradient descent, wherein dependencies are treated at runtime as symbolic values. PipeDream, part of Project Fiddle, is a novel approach to model training called pipeline parallelism to overcome the higher communication costs of data parallelism and the hardware resource inefficiency of model parallelism. The result is up to 5.3 times faster training time than traditional approaches. Read this blog post to learn more.
These capabilities, together with others like DeepSpeed, have been being integrated into the ONNX (Open Neural Network Exchange) runtime, to add distributed training support to this open-source high-performance runtime for machine learning models that is framework-agnostic and hardware-agnostic. This makes it the most efficient way to train and do inference of machine learning models in the framework and hardware of their choice.
DeepSpeed for large model training
DeepSpeed is a PyTorch-compatible library that vastly improves large model training by improving scale, speed, cost and usability—unlocking the ability to train models with over 100 billion parameters. One piece of the DeepSpeed library, ZeRO 2, is a parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. DeepSpeed is open-sourced at https://github.com/microsoft/DeepSpeed. You can learn more about them in this blog post from February 2020, and this update on ZeRO from May 2020.
AI at Scale is enabling breakthroughs in areas such as natural language processing (NLP), and multi-modality (combining language with other types of data, such as images, video, and speech).
In September 2020, DeepSpeed was updated with new system technologies: trillion-parameter model training with 3D parallelism, ZeRO-Offload to enable training of 10x larger models, Sparse Attention to power 10x longer sequences and 6x faster execution, and 1-bit Adam with up to 5x communication volume reduction.
Advances in natural language processing
The Turing Natural Language Generation (T-NLG) is a 17-billion parameter language model that outperforms the state of the art on many downstream NLP tasks. In particular, it can enhance the Microsoft Office experience through writing assistance and answering reader questions, and paves the way for more fluent digital assistants. You can read more about T-NLG in this blog post. In September 2020, Bing announced new updates that makes use of T-NLG to improve autosuggest results; read this blog post to learn more.
On the multi-modality language-image front, we’ve significantly outperform the state-of-the-art on downstream language-image tasks (e.g. visual search) with Oscar (Object-Semantics Aligned Pre-training).
Recently, pre-trained models such as Unicoder, M-BERT, and XLM have been developed to learn multilingual representations for cross-lingual and multilingual tasks. By performing masked language model, translation language model, and other bilingual pre-training tasks on multilingual and bilingual corpora with shared vocabulary and weights for multiple languages, these models obtain surprisingly good cross-lingual capability. However, the community still lacks benchmark datasets to evaluate such capability. To help researchers further advance language-agnostic models and make AI systems more inclusive, the XGLUE dataset helps researchers test a language model’s zero-shot cross-lingual transfer capability – its ability to transfer what it learned in English to the same task in other languages. Download the dataset here, and read this blog post to learn more.
We are incorporating these breakthroughs into the company’s products, including Bing, Office, Dynamics, and Xbox. Read this blog post to learn more.
Project Brainwave: new hardware for deep learning
In the realm of hardware, Project Brainwave is a deep learning platform for real-time AI inference in the cloud and on the edge. A soft Neural Processing Unit (NPU), based on a high-performance field-programmable gate array (FPGA), accelerates deep neural network (DNN) inferencing, with applications in computer vision and natural language processing. This approach is transforming computing by augmenting CPUs with an interconnected and configurable compute layer composed of programmable silicon.
With a high-performance, precision-adaptable FPGA soft processor, Microsoft datacenters can serve pre-trained DNN models with high efficiencies at low batch sizes. The use of an FPGA means that it is flexible for continuous innovations and improvements, making the infrastructure future-proof.
Exploiting FPGAs on a datacenter-scale compute fabric, a single DNN model can be deployed as a scalable hardware microservice that leverages multiple FPGAs to create web-scale services. This can process massive amounts of data in real time.
Learn more about Project Brainwave on:
Spell correction at scale
Customers around the world use Microsoft products in over 100 languages, yet most do not come with high-quality spell correction. This prevents customers from maximizing their ability to search for information on the web and enterprise—and even to author content. With AI at Scale, we used deep learning along with language families to solve this problem for customers by building what we believe is the most comprehensive and accurate spelling correction system ever in terms of language coverage and accuracy. Learn more in this blog post.
Visit the Microsoft Innovation site to learn more about this initiative, including a deep dive into the technology.
Nominate your organization for a private preview of Semantic Search by Project Turing >
Visit the Microsoft Project Turing site >