Distributed DNN platform

Established: August 1, 2016

DNN is a really important and practical machine learning capability, this project focuses on finding solution for distributing the DNN training on a cluster of machines based on our parameter server framework. The research in this project include: 1) support very efficient distributed training of DNN in MS CNTK project by introducing important features like asynchronous training, efficient sparse model training, GPU based optimization, rich NN related algorithms, model parallelism for training super big model. 2) invent effective distributed optimization method in non-convex scenario which can significant boost the DNN training performance. 3) Help DNN research community by enabling distributed training of many DNN toolset, e.g. Torch, Theano, Caffe based on our parameter server framework.