Deep Learning Compiler and Optimizer

Publications

Videos

Groups

Overview

This project aims to build a deep learning compiler and optimizer infrastructure that can provide automatic scalability and efficiency optimization for distributed and local execution.  Overall, this stack covers two types of general optimizations: fast distributed training over large-scale servers and efficient local execution on various hardware devices.  Currently, our optimizations focus on many different parts of the system stack, such as fast distributed training over RDMA, automatic computation placement across devices, automatic operator batching and kernel fusion, tensor algebra compiler, sparse and quantization optimizations, and so on.

 

 

People