In deep learning, researchers keep gaining higher performance by using larger models. However, there are two obstacles blocking the community to build larger models: (1) training larger models is more time-consuming, which slows down model design exploration, and (2) inference of larger models is also slow, which disables their deployment to computation constrained applications. In this talk, I will introduce some of our efforts to remove those obstacles. On the training side, we propose TernGrad to reduce communication bottleneck to scale up distributed deep learning; on the inference side, we propose structurally sparse neural networks to remove redundant neural components for faster inference. At the end, I will very briefly introduce (1) my recent efforts to accelerate AutoML, and (2) future work to utilize my research to overcome scaling issues in Natural Language Processing.
Wei Wen is a Ph.D. candidate at Duke University and a Student Researcher at Google Brain. His research is Machine Learning in general, with recent focuses on automated machine learning, efficient deep learning and scalable deep learning. He did internships at Microsoft Research, Google Brain, Facebook AI, and HP Labs. Some of his proposed methods have been deployed into AI productions, such as Facebook AI Infra, Intel Nervana, and PyTorch/Caffe2. He has authored/co-authored one Best Paper Award and three Best Paper Nominations in the communities of Supercomputing and Electrical Design Automation. His research has been featured by NeurIPS Oral, invited talks at UC Berkeley and Cornell University, and guest lecture at Rice University. Homepage: http://www.pittnuts.com/.