On The Algorithmic and System Interface of BIG LEARNING


February 13, 2014


In many modern applications built on massive data and using high-dimensional models, such as web-scale content extraction via topic models, genome-wide association mapping via sparse regression, and image understanding via deep neural network, one needs to handle BIG machine learning problems that threaten to exceed the limit of current infrastructures and algorithms. While ML community continues to strive for new scalable algorithms, and several attempts on developing new system architectures for BIG ML have emerged to address the challenge on the backend, good dialogs between ML and system remain difficult — most algorithmic research remain disconnected from the real system/data they are to face; and the generality, programmability, and theoretical guarantee of most systems on ML programs remain largely unclear. In this talk, I will present Petuum – a general-purpose framework for distributed machine learning, and demonstrate how innovations in scalable algorithms and distributed systems design work in concert to achieve multiple orders of magnitude of scalability on a modest cluster for a wide range of large scale problems in social network (mixed-membership inference on 100M node), personalized genome medicine (sparse regression on 100M dimensions), and computer vision (classification over 20K labels), with provable guarantee on correctness of distributed inference.


Eric Xing

Dr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional, multimodal, and dynamic possible worlds in social and biological systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley. His current work involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of gene regulation, genetic variation, and disease associations; and 3) large-scale systems for machine learning. Professor Xing has published over 190 peer-reviewed papers, and is an associate editor of the Annals of Applied Statistics (AOAS), the Journal of American Statistical Association (JASA), the IEEE Transaction of Pattern Analysis and Machine Intelligence (PAMI), the PLoS Journal of Computational Biology, and an Action Editor of the Machine Learning Journal (MLJ), the Journal of Machine Learning Research (JMLR). He is a member of the DARPA Information Science and Technology (ISAT) Advisory Group, a recipient of the NSF Career Award, the Sloan Fellowship, the United States Air Force Young Investigator Award, the IBM Open Collaborative Research Award, and best paper awards in a number of premier conferences including UAI, ACL, SDM, and ISMB. He is the Program Chair of ICML 2014 to take place in Beijing.