UW – MSR Machine Learning workshop 2015 – Session 2

February 27, 2015
Daniela Witten, Kean Ming Tan, and Abram Friesen | University of Washington

11:00 Sure Screening for Guassian Graphical Models – Daniela Witten In an undirected graphical model, the nodes represent random variables, and an edge represents conditional dependence between the corresponding pair of nodes. In recent years, the statistical and machine learning communities have focused a lot of attention upon the problem of learning the structure of a graph in the high-dimensional setting. We propose graphical sure screening, or GRASS, an extremely simple and computationally-efficient screening procedure for this task: we simply threshold the elements of the sample covariance matrix. With very high probability, the GRASS estimated edge set contains the true edge set; furthermore, we can control the size of the estimated edge set and the rate of false positives. Furthermore, in empirical studies, GRASS outperforms much more complex and computationally-demanding state-of-the-art estimators for high-dimensional graph estimation.

11:25 Quantum Deep Learning – Ashish Kapoor In recent years, deep learning has had a profound impact on machine learning and artificial intelligence. An important question remaining is that of whether quantum computing will lead to improved methods for deep learning. We show that quantum computing can not only reduce the time required to train a deep restricted Boltzmann machine, but also provides a much more natural environment for deep learning than classical computing does. Specifically, it obviates the need to use the contrastive divergence approximation, to impose directionality on the graphical model, or to use greedy layer-by-layer training, thereby significantly improving the optimization of the underlying objective function. Our quantum approach also allows richer classes of models for deep learning, including full Boltzmann machines and multilayer, fully connected models, to be efficiently trained. This is a joint work with Nathan Wiebe and Krysta Svore.

11:50Spotlight: Learning Graphical Models with Hubs – Kean Ming Tan We consider the problem of learning a high-dimensional graphical model in which there are a few hub nodes that are densely-connected to many other nodes. Many authors have studied the use of an l1 penalty in order to learn a sparse graph in the high-dimensional setting. However, the l1 penalty implicitly assumes that each edge is equally likely and independent of all other edges. We propose a general framework to accommodate more realistic networks with hub nodes, using a convex formulation that involves a row-column overlap norm penalty. We apply this general framework to three widely-used probabilistic graphical models: the Gaussian graphical model, the covariance graph model, and the binary Ising model. An alternating direction method of multipliers algorithm is used to solve the corresponding convex optimization problems. On synthetic data, we demonstrate that our proposed framework outperforms competitors that do not explicitly model hub nodes. We illustrate our proposal on a webpage data set and a gene expression data set.

11:55Spotlight: Recursive Decomposition for Nonconvex Optimization – Abram Friesen Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are non-convex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problem-decomposition approach to nonconvex optimization. Similarly to DPLL-style SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent sub-functions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. We observe experimentally that RDIS outperforms standard techniques on problems like structure from motion and protein folding.

Speaker Details

Daniela Witten’s research involves the development of statistical machine learning methods for high-dimensional data, with applications to genomics and other fields. Daniela is a co-author of the very popular textbook “Introduction to Statistical Learning”. She is the recipient of a number of honors, including an NIH Director’s Early Independence Award, a Sloan Research Fellowship, and an NSF CAREER Award. Daniela completed her BS in 2005 and her PhD in Statistics in 2010, both from Stanford University. Since 2014, Daniela is an associate professor in Statistics and Biostatistics at University of Washington.

Kean Ming Tan is a fourth year PhD student in the Department of Biostatistics at University of Washington, working with Daniela Witten. Kean Ming is interested in developing statistical methodologies for analyzing high-dimensional data. In particular, Kean Ming has worked on topics related to clustering, graphical models, and effect size estimation. Kean Ming enjoys casting real world problems into statistical problems, and figuring out how to solve them efficiently.