Workshop on Algorithms and Data Science

Luca Cardelli, Microsoft Research |I will discuss a connection between an efficient and well-studied distributed consensus algorithm (Approximate Majority) expressed as a 4-reaction chemical system, and a Nobel prize winning biological switch that is found in the cell cycle of all eukaryotes, from yeast to us. This connection is an example of ‘network morphisms’ that preserve both structure and functionality across systems of different complexity.

Dan Alistarh, Microsoft Research |The need to process larger and larger amounts of data as efficiently as possible has been one of the main trends of the past decade. This new set of requirements significantly changes the way data structures are designed, implemented and employed. In this talk, I will give a couple of examples of such new data structure designs, and describe some of the major challenges in the area.

Robert E. Tarjan, Princeton University and Microsoft Research |In the study and use of algorithms, theory and practice interact, as do algorithm and data structure design. To illustrate this, I’ll discuss recent theoretical and practical results on the problem of finding dominators in a flow graph and on the disjoint set union problem. The former is an important basic computation in optimizing compilers; the latter is a classical problem in data structures with diverse applications. Fast algorithms to find dominators rely on disjoint set union algorithms and extensions. In practice, simple versions of these algorithms beat theoretically faster ones. A theoretical analysis of a randomized disjoint set union algorithm provides an explanation of these empirical results.

Laurent Massoulie, MSR/INRIA joint research centre |Community detection, similar to graph partitioning, consists in identification of groups of similar items within a population based on observed interactions. In the context of online social networks, it is a useful primitive for recommending either contacts or news items to users. We will consider a particular generative probabilistic model for the observations, namely the so-called stochastic block model, and generalizations thereof. We will describe spectral methods for community detection. Exploiting results on the spectrum of random graphs, we will establish consistency of these approaches under suitable assumptions, namely presence of a sufficiently strong signal in the observed data. We will also discuss open questions on phase transitions for detectability in such models when the signal becomes weak. In particular we will introduce a novel spectral method which provably allows detection of communities down to a critical threshold, thereby settling an open conjecture of Decelle, Krzakala, Moore and Zdeborová. We will conclude with open problems of current interest in the field.

Olya Ohrimenko, Microsoft Research |Cloud storage has emerged as the next generation of data storage where users can remotely store their data and leave its management to a third party, e.g., Microsoft Azure. However, the fact that users no longer have physical possession of their data raises new challenges in terms of data privacy. Storing the data in encrypted form is a key component in maintaining privacy against the storage provider. However, encryption alone is not enough since information may be leaked through the pattern in which users access the data. In this talk, I describe algorithms that allow data-oblivious access to remotely stored data. That is, access patterns of such algorithms depend only on the size of the outsourced data and algorithm input, but not their content. Hence, such algorithms reveal nothing about the data they are processing.

Pushmeet Kohli, Microsoft Research |Many problems in computer vision and machine learning require inferring the most probable states of certain hidden or unobserved variables. This inference problem can be formulated in terms of minimizing a function of discrete variables. The scale and form of computer vision problems raise many challenges in this optimization task. For instance, functions encountered in vision may involve millions or sometimes even billions of variables. Furthermore, the functions may contain terms that encode very high-order interaction between variables. These properties ensure that the minimization of such functions using conventional algorithms is extremely computationally expensive. In this talk, I will discuss how many of these challenges can be overcome by exploiting the sparse and heterogeneous nature of discrete optimization problems encountered in real world computer vision problems. Such problem-aware approaches to optimization can lead to substantial improvements in running time and allow us to produce good solutions to many important problems.