About
Local algorithms, that is, algorithms that compute and make decisions on parts of the output considering only a portion of the input, have been studied in a number of areas in theoretical computer science and mathematics. Some of these areas include sublineartime algorithms, distributed algorithms, inference in large networks and graphical models. These communities have similar goals but a variety of approaches, techniques, and methods. This workshop was aimed at fostering dialogue and crosspollination of ideas between the various communities. To this end, the workshop featured longer talks that, in part, surveyed approaches by various communities, as well as short, focused talks on recent, exciting results.
Overview talks
 Emmanuel Abbe, Princeton University
 Christian Borgs, Microsoft Research
 Mohsen Ghaffari, ETH Zurich
 Elchanan Mossel, Massachusetts Institute of Technology
 Dana Ron, Tel Aviv University
Organizing committee
Confirmed participants
 Maryam Aliakbarpour, Massachusetts Institute of Technology
 Zeyuan AllenZhu, Massachusetts Institute of Technology
 Alkida Balliu, Gran Sasso Science Institute
 Tugkan Batu, London School of Economics and Political Science
 Clement Canonne, Columbia University
 Artur Czumaj, University of Warwick
 Laurent Feuilloley, University of Paris Diderot
 Pierre Fraigniaud, University of Paris Diderot
 David Gamarnik, MIT Sloan School of Management
 Shafi Goldwasser, Massachusetts Institute of Technology
 Mika Goos, University of Toronto
 Elena Grigorescu, Purdue University
 Amin Karbasi, Yale University
 Kevin Matulef, Sandia National Laboratories
 Moti Medina, Max Planck Institute
 Jason Morton, Pennsylvania State University
 Meiram Murzabulatov, Pennsylvania State University
 Huy N. Nguyen, Northeastern University
 Dennis Olivetti, Gran Sasso Science Institute
 Krzystof Onak, IBM
 Lorenzo Orecchia, Boston University
 Asu Ozdaglar, Massachusetts Institute of Technology
 John Peebies
 Vijaya Ramachandran, University of Texas at Austin
 Sofya Raskhodnikova, Pennsylvania State University
 Adi Rosen, University of Paris Diderot
 Devavrat Shah, Massachusetts Institute of Technology
 Asaf Shapira, Tel Aviv University
 Christian Sohler, Technical University of Dortmund
 HsinHao Su
 Madhu Sudan, Harvard University
 Jukka Suomela, Aalto University
 Robert Tarjan, Princeton University
 Ali Vakilian, Massachusetts Institute of Technology
 Greg Valiant, Stanford University
 Shai Vardi, California Institute of Technology
 Nithin Varma, Pennsylvania State University
 Gandikota Venkata
 Anak Yodpinyanee, Massachusetts Institute of Technology
Agenda
Day 1  Friday, October 14
Time  Session  Speaker 

8:30–9:00

Registration and Breakfast  
9:00–9:10

Opening remarks  Ronitt Rubinfeld 
9:10–9:50

The power of locality for network algorithms  Christian Borgs 
9:50–10:10

Latent Space Model (aka Blind Regression)  Devavrat Shah 
10:10–10:30

Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods  Asu Ozdaglar 
10:30–10:50

Break  
10:50–11:30

A short (and partial) survey on Partition Oracles  Dana Ron 
11:30–11:50

Local Computation Algorithms for High Degree Graphs  Shai Vardi 
11:50–12:10

Testing Graph Properties with a Polynomial Query Complexity  Asaf Shapira 
12:10–12:30

A Local Algorithm for Constructing Spanners in MinorFree Graphs  Reut Levi 
12:30–2:10

Lunch  
2:10–2:50

Local to global amplification: the block model vignette  Emmanuel Abbe 
2:50–3:10

On Limits of Local Algorithms for Solving Random Constraint Satisfaction Problems  David Gamarnik 
3:10–3:30

Locality in Coding Theory  Madhu Sudan 
3:30–3:50

Local algorithms, message passing and the hidden clique problem  Yash Deshpande 
3:50–4:20

Break


4:20–5:00

A few perspectives on local algorithms for sparse graphs  Elchanan Mossel 
5:00–5:20

Sublinear Time Algorithms via Optimization  Zheyuan AllenZhu 
5:20–5:40

An Optimization Approach to Local Graph Partitioning  Lorenzo Orrecchia 
5:40–6:00

ErasureResilient Property Testing  Sofya Raskhodnikova 
Day 2  Saturday, October 15
Time  Session  Speaker 

8:30–9:00

Registration and Breakfast  
9:00–9:40

Improved Distributed Local Graph Algorithms  Mohsen Ghaffari 
9:40–10:00

NonLocal Probes Do Not Help with Many Graph Problems

Moti Medina 
10:00–10:20

Local Conflict Coloring

Pierre Fraigniaud 
10:20–10:40

Designing Local Algorithms with Algorithms

Jukka Suomela 
10:40–11:10

Break  
11:10–11:30

Random walks approach in property testing

Artur Czumaj 
11:30–11:50

Approximating the Spectrum of a Graph  Christian Sohler 
11:50–12:20

Poster session  
12:20–2:00

Lunch + poster session contd


2:00–2:20

Fast Constrained Submodular Maximization

Amin Karbasi 
2:20–2:40

Sublinear RandomAccess Generators for Preferential Attachement Graphs

Adi Rosen 
2:40–3:00

Robust Learning and the SemiVerified Model

Greg Valiant 
3:00–3:20

Heavy hitters via clusterpreserving clustering

Huy L. Nguyen 
3:20–3:40

Slow inconsistent statistics  Jason Morten 
3:40–4:10

Break  
4:10–4:30

Testing KMonotonicity  Elena Grigorescu 
4:30–4:50

HyperHeadTail: a Streaming Algorithm for Estimating the Degree Distribution of Dynamic Multigraphs  Kevin Matulef 
4:50–5:10

Computing with Unknown Noise Rate  Jared Saia 
5:10–5:30

Concurrent Disjoint Set Union  Robert Tarjan 
5:30–5:50

Locality and Parallelism  Krzysztof Onak 
Abstracts
Day 1  Friday, October 14
The power of locality for network algorithms
Speaker: Christian Borgs
TBA
Latent Space Model (aka Blind Regression)
Speaker: Devavrat Shah
In this talk, we shall discuss the Latent Space Model in the context of various modern applications: recommendation systems, crowdsourcing and demand prediction in retail. We’ll briefly discuss the connection between popular heuristic “collaborative filtering” and the firstorder Taylor’s expansion in the context of this model. Time permitting, we’ll discuss desiderata of local algorithms in the context of these models.
Based on joint works with Christina Lee, Dogyoon Song, Jehangir Muhammad and Yehua Li.
Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods
Speaker: Asu Ozdaglar
We focus on the problem of minimizing the sum of smooth component functions (where the sum is strongly convex) and a nonsmooth convex function, which arises in regularized empirical risk minimization in machine learning and distributed constrained optimization in wireless sensor networks and smart grids. We consider solving this problem using the proximal incremental aggregated gradient (PIAG) method, which at each iteration moves along an aggregated gradient (formed by incrementally updating gradients of component functions according to a deterministic order) and taking a proximal step with respect to the nonsmooth function. While the convergence properties of this method with randomized orders (in updating gradients of component functions) have been investigated, this paper, to the best of our knowledge, is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order. In particular, we show that the PIAG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small. We explicitly identify the rate of convergence and the corresponding step size to achieve this convergence rate. Our results improve upon the best known condition number dependence of the convergence rate of the incremental aggregated gradient methods used for minimizing a sum of smooth functions.
This is joint work with Denizcan Vanli and Mert Gurbuzbalaban.
A short (and partial) survey on Partition Oracles
Speaker: Dana Ron
In this talk I will present the notion of a Partition Oracle, as defined by Hassidim, Kelner, Nguyenn and Onak. I will describe several partition oracles, for various families of graphs, and show how such oracles can be applied to solve a variety of approximation problems in sublinear time.
Local Computation Algorithms for High Degree Graphs
Speaker: Shai Vardi
Local Computation Algorithms (abbreviated LCA) is a computational model aimed at problem instances with huge inputs and output. For graph problems, the input graph is accessed using probes, where a probe specifies the ID of a vertex and receives in reply the IDs of its neighbors. Given a local query (e.g., “is a certain vertex in the vertex cover of the input graph?”), an LCA should compute the corresponding local output (e.g., “yes” or “no”) while making only a small number of probes. The requirement is that all local outputs form a single global solution (e.g., a legal vertex cover).
In this talk, I will discuss the possibilities and limitations of LCAs that are required to work on graphs that may have arbitrarily large degrees (possibly linear in $n$) but are limited to a number of probes that is significantly smaller than the minimal degree. I will present negative and positive results: I will show a probe complexity lower bounds for approximating minimum vertex cover and finding a maximal matching, and an algorithm for finding an approximate maximum matching on highgirth regular graphs that uses a constant number of probes.
Testing Graph Properties with a Polynomial Query Complexity
Speaker: Asaf Shapira
About 10 years ago, I proved with N. Alon that every hereditary graph property can be tested with a constant number of queries. Unfortunately, the constants involved were enormous. A natural question is therefore to decide if for natural graph properties one can get a polynomial bound.
Our first result in this work is a very simple sufficient combinatorial criteria which guarantees that a graph property can be tested with a polynomial number of queries. This allows us to recover (almost) all previous results in the area, as well as obtain some new ones.
Our second main result is a very simple sufficient combinatorial criteria which guarantees that a graph property cannot be tested with a polynomial number of queries. This allows us to recover (almost) all previous results in the area, as well as obtain some new ones.
Joint work with L. Gishboliner.
A Local Algorithm for Constructing Spanners in MinorFree Graphs
Speaker: Reut Levi
Constructing a spanning tree of a graph is one of the most basic tasks in graph theory. We consider this problem in the setting of local algorithms: one wants to quickly determine whether a given edge $e$ is in a specific spanning tree, without computing the whole spanning tree, but rather by inspecting the local neighborhood of $e$. The challenge is to maintain consistency. That is, to answer queries about different edges according to the {\em same\/} spanning tree. Since it is known that this problem cannot be solved without essentially viewing all the graph, we consider the relaxed version of finding a spanning subgraph with $(1+\eps)n$ edges instead of $n1$ edges (where $n$ is the number of vertices and $\eps$ is a given approximation/sparsity parameter). It is known that this relaxed problem requires inspecting $\Omega(\sqrt{n})$ edges in general graphs (for any constant $\eps$), which motivates the study of natural restricted families of graphs. One such family is the family of graphs with an excluded minor (which in particular includes planar graphs). For this family there is an algorithm that achieves constant success probability, and inspects $(d/\eps)^{\poly(h)\log(1/\eps)}$ edges (for each edge it is queried on), where $d$ is the maximum degree in the graph and $h$ is the size of the excluded minor. The distances between pairs of vertices in the spanning subgraph $G’$ are at most a factor of $\poly(d, 1/\eps, h)$ larger than in $G$.
In this work, we show that for an input graph that is $H$minor free for any $H$ of size $h$, this task can be performed by inspecting only $\poly(d, 1/\eps, h)$ edges in $G$. The distances between pairs of vertices in the spanning subgraph $G’$ are at most a factor of $\tilde{O}(h\log(d)/\eps)$ larger than in $G$. Furthermore, the error probability of the new algorithm is significantly improved to $\Theta(1/n)$.
This algorithm can also be easily adapted to yield an efficient algorithm for the distributed (message passing) setting.
Joint work with Dana Ron and Ronitt Rubinfeld.
Local to global amplification: the block model vignette
Speaker: Emmanuel Abbe
I will attempt to describe a framework in which variables need to be recovered from their local noisy interactions, and for which a single phase transition takes place between the computationally solvable and informationtheoretically unsolvable regimes. The idea is based on amplifying local thresholds to global ones. Our studycase will be exact recovery in the stochastic block model. We will also discuss how this breaks down for the detection problem, with the emergence of computational gaps.
On Limits of Local Algorithms for Solving Random Constraint Satisfaction Problems
Speaker: David Gamarnik
The talk will discuss algorithms for finding solutions of randomly generated constraint satisfaction problem such random KSAT problem or finding a largest independent set of a graph. We establish a fundamental barrier on the power of local algorithms to solve such problems, despite some conjectures put forward in the past. In particular, we refute a conjecture regarding the power of socalled i.i.d factors to find nearly largest independent sets in random regular graphs. Similarly, we show that a broad class of sequential local algorithms are incapable of finding satisfying assignments for a random NAEKSAT problem, above a certain asymptotic threshold, below which even simple algorithms succeed with high probability. Our negative results exploit fascinating geometry of feasible solutions of random constraint satisfaction problems, which was first predicted by physicists heuristically and now confirmed by rigorous methods. According to this picture, the solution space exhibits a clustering property whereby the feasible solutions tend to cluster according to the underlying Hamming distance. We show that success of local algorithms would imply violation of such a clustering property thus leading to a contradiction.
Joint work with Madhu Sudan, Microsoft Research.
Locality in Coding Theory
Speaker: Madhu Sudan
I will survey the concept of locality in coding theory, including the notions of locally decodable codes, locally testable codes, local reconstruction codes etc. Several of these notions are accompanied by strikingly strong constructions, and a few have even turned to be valuable (measured in $$$s).
Local algorithms, message passing and the hidden clique problem
Speaker: Yash Deshpande
The hidden clique problem posits to find a large clique, or completely connected subgraph, in an otherwise random ErdosRenyi graph. This problem was originally posed in theoretical computer science as an average case version of the maximum clique problem. More recently, it has seen intense interest from a variety of communities as a statistical estimation problem wherein computationally efficient algorithms are far from achieving known statistically optimal thresholds. I will discuss local algorithms in the context of the hidden clique problem and generalizations, which form the basis of the bestknown algorithms for this setting.
This is joint work with Andrea Montanari.
A few perspectives on local algorithms for sparse graphs
Speaker: Elchanan Mossel
I will give a few different perspectives on local algorithms in sparse graphs coming from inference and optimization on random graphs, Bayesian economics on sparse graphs and deep learning.
Sublinear Time Algorithms via Optimization
Speaker: Zheyuan AllenZhu
In this talk, I will discuss how to use optimization to obtain sublinear algorithms, as well as an interpolation between sublineartime and lineartime algorithms using optimization insights and acceleration techniques.
An Optimization Approach to Local Graph Partitioning
Speaker: Lorenzo Orrecchia
Local clustering algorithms find a good approximation to a local sparse cut around a seed node while running in nearlylinear time in the size of the output. The main idea behind many of these algorithms is of spectral nature. The algorithms simulate a random walk starting at the seed node. The target local cut will limit the mixing, allowing the walk to both approximate the conductance of the cut and maintain the locality of the algorithm. In this talk, I will consider an optimization interpretation of these algorithms. Besides giving us a more principle framework to think about random walks for local clustering, the optimization view will also allow us to discuss nonspectral analogues of these algorithms and introduce local maximum flow computations. I will show how this novel local primitive can be combined with random walks to yield improved guarantees for local clustering. Finally, I will discuss a number of open questions about localizing SDPbased global approximation algorithms for sparsest cut using local random walks and local maximum flows.
ErasureResilient Property Testing
Speaker: Sofya Raskhodnikova
Property testers form an important class of sublinear algorithms. In the standard property testing model, an algorithm accesses the input function via an oracle that returns function values at all queried domain points. In many realistic situations, the oracle may be unable to reveal the function values at some domain points due to privacy concerns, or when some of the values get erased by mistake or by an adversary.
We initiate a study of property testers that are resilient to the presence of adversarially erased function values. An alphaerasureresilient epsilontester for a property P is given parameters alpha, epsilon in (0,1), along with oracle access to a function f such that at most an alpha fraction of the function values have been erased. The tester does not know whether a point is erased unless it queries that point. The tester has to accept with high probability if there is a way to assign values to the erased points such that the resulting function satisfies P. It has to reject with high probability if, for all assignments of values to the erased points, the resulting function has to be changed in at least an epsilonfraction of the nonerased domain points to satisfy P.
We design erasureresilient property testers for a large class of properties. For some properties, it is possible to obtain erasureresilient testers by using standard testers as a black box. But there are more challenging properties for which all known testers rely on querying a specific point. If this point is erased, all these testers break. We give efficient erasureresilient testers for several classes of such properties including monotonicity, the Lipschitz property, and convexity. Finally, we describe a property that can be epsilontested with O(1/epsilon) queries in the standard model, whereas testing it in the erasureresilient model requires number of queries polynomial in the input size.
Joint work with Kashyap Dixit, Abhradeep Thakurta, and Nithin Varma.
Day 2  Saturday, October 15
Improved Distributed Local Graph Algorithms
Speaker: Mohsen Ghaffari
TBA
NonLocal Probes Do Not Help with Many Graph Problems
Speaker: Moti Medina
This work bridges the gap between distributed and centralised models of computing in the context of sublineartime graph algorithms. A priori, typical centralised models of computing (e.g., parallel decision trees or centralised local algorithms) seem to be much more powerful than distributed messagepassing algorithms: centralised algorithms can directly probe any part of the input, while in distributed algorithms nodes can only communicate with their immediate neighbours. We show that for a large class of graph problems, this extra freedom does not help centralised algorithms at all: efficient stateless deterministic centralised local algorithms can be simulated with efficient distributed messagepassing algorithms. In particular, this enables us to transfer existing lower bound results from distributed algorithms to centralised local algorithms.
Joint work with Mika Goos, Juho Hirvonen, Reut Levi, and Jukka Suomela.
Local Conflict Coloring
Speaker: Pierre Fraigniaud
Locally finding a solution to \emph{symmetrybreaking} tasks such as vertexcoloring, edgecoloring, maximal matching, maximal independent set, etc., is a longstanding challenge in \emph{distributed network} computing. More recently, it has also become a challenge in the framework of \emph{centralized local} computation. We introduce \emph{conflict coloring} as a general symmetrybreaking task that includes all the aforementioned tasks as specific instantiations — conflict coloring includes all \emph{locally checkable labeling}~tasks from [Naor\ \&\ Stockmeyer, STOC 1993]. Conflict coloring is characterized by two parameters $l$ and $d$, where the former measures the amount of freedom given to the nodes for selecting their colors, and the latter measures the number of constraints which colors of adjacent nodes are subject to. We show that, in the standard \LOCAL model for distributed network computing, if $l/d > \Delta$, then conflict coloring can be solved in $\Otilde(\sqrt{\Delta})+\log^*n$ rounds in $n$node graphs with maximum degree~$\Delta$, where $\Otilde$ ignores the polylog factors in $\Delta$. The dependency in~$n$ is optimal, as a consequence of the $\Omega(\log^*n)$ lower bound by [Linial, SIAM J. Comp. 1992] for $(\Delta+1)$coloring. An important special case of our result is a significant improvement over the best known algorithm for distributed $(\Delta+1)$coloring due to [Barenboim, PODC 2015], which required $\Otilde(\Delta^{3/4})+\log^*n$ rounds. Improvements for other variants of coloring, including $(\Delta+1)$listcoloring, $(2\Delta1)$edgecoloring, coloring with forbidden color distances, etc., also follow from our general result on conflict coloring. Likewise, in the framework of centralized local computation algorithms (LCAs), our general result yields an LCA which requires a smaller number of probes than the previously best known algorithm for vertexcoloring, and works for a wide range of coloring problems.
Joint work with Marc Heinrich and Adrian Kosowski.
Designing Local Algorithms with Algorithms
Speaker: Jukka Suomela
In this talk I will show that there are settings in which the design of local distributed algorithms can be automated. In essence, we can synthesize asymptotically optimal local algorithms for solving nontrivial graph problems.
I will focus on LCL problems (locally checkable labellings). These are problems in which valid solutions can be detected by checking the constantradius neighborhoods of all nodes. Examples of such problems include maximal independent sets, maximal matchings, vertex coloring, and edge coloring.
We will study LCL problems on 2dimensional grids (more precisely, toroidal n x n grids with consistent orientations and unique identifiers). In this setting, each LCL problem has a complexity of precisely O(1), Theta(log^* n), or Theta(n) rounds. There are no problems with an “intermediate complexity” of e.g. Theta(log n); all problems are either local or global.
The bad news is that it is undecidable to tell if a given LCL problem is local or global. However, we show that this is the only obstacle for algorithm synthesis: if we are told (or make a lucky guess) that the problem is local, then we can use a computer program to find an O(log^* n)round algorithm for solving it! Furthermore, this is not merely a theoretical construction: we have successfully used computers to design local algorithms for numerous LCL problems, many of which do not have any humandesigned algorithms. Perhaps the most interesting case is a computerdesigned local algorithm for 4coloring, complemented with a humandesigned proof that shows that 3coloring cannot be solved locally.
This is joint work with Sebastian Brandt, Orr Fischer, Juho Hirvonen, Janne H. Korhonen, Tuomo Lempiäinen, Christopher Purcell, Joel Rybicki, Patric Östergård, and Przemysław Uznański.
Random walks approach in property testing
Speaker: Artur Czumaj
I’ll survey some of the recent results in graph property testing that rely on the random walk approach, and then I’ll briefly discuss some of the challenges of this approach.
Approximating the Spectrum of a Graph
Speaker: Christian Sohler
The spectrum of a graph $G=(V,E)$ with adjacency matrix $A$ consists of the eigenvalues of the normalized Laplacian of $L= I – D^{1/2} A D^{1/2}$. We study the problem of approximating the spectrum $\lambda = (\lambda_1,\dots,\lambda_{V})$, $0 \le \lambda_1,\le \dots, \le \lambda_{V}\le 2$ of $G$ when the algorithm can query random vertices of $G$ and for a given vertex a random neighbor in $O(1)$ time. We present a sublinear time algorithm that computes a succinct representation of an approximation $\widetilde \lambda = (\widetilde \lambda_1,\dots,\widetilde \lambda_{V})$, $0 \le \widetilde\lambda_1,\le \dots, \le \widetilde \lambda_{V}\le 2$ such that $\\widetilde \lambda – \lambda\_1 \le \epsilon n$. Our algorithm has query complexity and running time $exp(O(1/\eps))$, independent of $V$.
We then study the implications of our algorithm to property testing in the bounded degree graph model.
Joint work with David CohenSteiner and Gregory Valiant.
Poster session
 Maryam Aliakbarpour
 Alkida Balli: Local Distributed Verification
 Clement Canonne
 Laurent Feuilloley
 Meiram Murzabulatov: Tolerant Testers of Image Properties
 Ali Vakilian
 Nithin Varma: Erasureresilient property testing
 Gandikota Venkata: Local Testing for Membership in Lattices
Fast Constrained Submodular Maximization
Speaker: Amin Karbasi
Can we summarize multicategory data based on user preferences in a scalable manner? Many utility functions used for data summarization satisfy submodularity, a natural diminishing returns property. We cast personalized data summarization as an instance of a general submodular maximization problem subject to multiple constraints. We develop the first practical and FAst coNsTrained submOdular Maximization algorithm, FANTOM, with strong theoretical guarantees. FANTOM maximizes a submodular function (not necessarily monotone) subject to the intersection of a psystem and l knapsacks constrains. It achieves a (p+1)(2p+2l+1)/p approximation guarantee with only O(nrp log(n)) query complexity (n and r indicate the size of the ground set and the size of the largest feasible solution, respectively). We then show how we can use FANTOM for personalized data summarization. In particular, a psystem can model different aspects of data, such as categories or time stamps, from which the users choose. In addition, knapsacks encode users’ constraints including budget or time. In our set of experiments, we consider several concrete applications: movie recommendation over 11K movies, personalized image summarization with 10K images, and revenue maximization on the YouTube social networks with 5000 communities. We observe that FANTOM constantly provides the highest utility against all the baselines.
Sublinear RandomAccess Generators for Preferential Attachement Graphs
Speaker: Adi Rosen
We consider the problem of giving to a user access to a graph which is sampled according to a certain probability distribution, and are interested in the time, spaceand randomness complexities of such samplers. We are specifically interested in the case when the distribution is defined using a sequentiallyevolving graph model.
In the standard approach the whole graph is sampled according to the probability distribution, then stored, and later the user can issue certain queries, e.g., asking for the adjacency list of a given node. This however may require prohibitive amounts of time, space and random bits, especially when the size of the graph is big and only few queries are actually issued. Instead, we propose to generate the graph on the fly, in response to queries, while being able to respect the probability distribution defined by the sequential model.
We focus on the BarabasiAlbert Preferential Attachement model (BAgraphs), and give onthefly generation algorithms with the following properties: Let n denote the number of nodes in the underlying sampled graph; with probability 11/poly(n), each query is answered in polylog(n) time, and for each query both the increase in space, and the amount of random bits used are polylog(n). The answers that the user gets to its queries are (with probability 1) distributed according to the probability distribution which is the projection, on the queried nodes, of the original distribution on graphs.
Joint work with Guy Even, Reut Levi, and Moti Medina.
Robust Learning and the SemiVerified Model
Speaker: Greg Valiant
How can the availability of a very small amount of highquality or trusted data enable the efficient extraction of the information contained in a large but lesstrusted dataset? We investigate this question via the introduction of a new model for studying robust estimation, learning, and optimization, which we term the “semiverified” model. In this model, we assume a large dataset of which a subset consists of points drawn from the distribution of interest, and make no assumptions on the remaining points—they could be wellbehaved, extremely biased, or even adversarially generated as a function of the remaining points. Additionally we assume a much smaller (typically constantsized) set of “verified” or trusted data points that have been drawn from the distribution of interest. We present several results in this model, for a large class of mathematically clean and practically relevant robust estimation and learning tasks. These results include both local and nonlocal algorithms that achieve information theoretic limits for two different parameter regimes.
The talk is based on joint works with Jacob Steinhardt and Moses Charikar, and with Michela Meister.
Heavy hitters via clusterpreserving clustering
Speaker: Huy L. Nguyen
In the turnstile L_p heavy hitters problem with parameter eps, one must maintain a highdimensional vector x in R^n subject to updates of the form update(i, Delta), which increases x_i by Delta, where i is in [n], and Delta is an integer. Upon receiving a query, the goal is to report every “heavy hitter” i in[n] with x_i >= eps*x_p as part of a list L, which is a subset of [n] of size O(1/\eps^p), i.e. proportional to the maximum possible number of heavy hitters. In fact we solve the stronger tail version, in which L should include every i such that x_i >= \eps x_{proj{1/eps^p}}_p and x_i > 0, where x_{proj{k}} denotes the vector obtained by zeroing out the largest k entries of x in magnitude.
We provide a new algorithm, which in the most general turnstile model achieves optimal O(\eps^{p}\log n) space, O(\log n) update time, and fast O(\eps^{p}\poly(\log n)) query time, providing correctness whp. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a “clusterpreserving clustering” algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our clusterpreserving clustering may be of broader interest much beyond heavy hitters.
Joint work with Kasper Green Larsen, Jelani Nelson, and Mikkel Thorup.
Slow inconsistent statistics
Speaker: Jason Morten
Traditionally categorical data analysis (e.g. generalized linear models) works with simple, flat datasets akin to a single table in a database with no missing data. What if we have a distributed database with many partial local tables, and data is streaming in too fast for the local tables ever to be globally consistent? I’ll discuss how some hints from physics (e.g. quantum nonlocality) could help define sensible models in this scenario.
Testing KMonotonicity
Speaker: Elena Grigorescu
In this work we initiate the study of Boolean kmonotone functions in the Property Testing model. Kmonotone functions defined over a finite poset domain D alternate between the values 0 and 1 at most k times on any ascending chain in D. Hence, kmonotone functions are natural generalizations of the classical monotone functions, which are the 1monotone functions. This work is motivated by the recent interest in kmonotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing.
I will present our results on testing kmonotonicity on the hypercube and hypergrid domains, outline some connections with other problems from the literature, and discuss a few important open questions.
Joint work with Clement Canonne, Siyao Guo, Akash Kumar, Karl Wimmer.
HyperHeadTail: a Streaming Algorithm for Estimating the Degree Distribution of Dynamic Multigraphs
Speaker: Kevin Matulef
We present HyperHeadTail, a new streaming algorithm for estimating the degree distribution of a graph from a stream of edges using very little space. Our algorithm builds upon the HeadTail algorithm of Simpson, Comandur, and McGregor (2015), but extends it to handle repeat edges and sliding time windows, making it suitable for application to realworld streams such as those generated by network traffic or other communications networks. We implement HyperHeadTail, and demonstrate its utility on both synthetic and realworld data sets, showing that it effectively captures the degree distribution, with space requirements equal to only a small fraction of the graph.
Computing with Unknown Noise Rate
Speaker: Jared Saia
How can we compute over a noisy channel? Say Alice and Bob want to run a distributed algorithm, but can only communicate over a channel where some number of bits may be flipped. What is the smallest number of bits they must send in order to obtain the correct output, with high probability? This general problem is called Interactive Communication (IC).
Several results in IC take a protocol requiring L bits of noisefree communication, and make it robust over a channel with adversarial noise. In a recent result, Haeupler described an algorithm that sends a number of bits that is conjectured to be near optimal. However, his algorithm critically requires a priori knowledge of the number of bits that will be flipped by the adversary.
In this talk, we describe an algorithm requiring no such knowledge. To achieve this result, our algorithm must make local decisions about communication redundancy, without access to global information about past and future noise rates. If an adversary flips T bits, our algorithm sends L + softO (T + √ (LT)) bits in expectation and succeeds with high probability in L. Assuming a conjectured lower bound by Haeupler, our result is optimal up to logarithmic factors.
Concurrent Disjoint Set Union
Speaker: Robert Tarjan
The disjoint set union problem is a classical problem in data structures with a simple and efficient sequential solution that has a notoriously complicated analysis. One application is to find strongly connected components in huge, implicitly defined graphs arising in model checking. In this application, the use of multiprocessors has the potential to produce significant speedups. We explore this possibility. We devise and analyze concurrent versions of standard sequential algorithms that use single and double compareandswap primitives for synchronization, making them waitfree. We obtain work bounds that grow logarithmically with the number of processors, suggesting the possibility of significant speedup in practice. This is joint work with Siddhartha Jayanti, an undergraduate at Princeton.
Locality and Parallelism
Speaker: Krzysztof Onak
The last decade has seen an unprecedented success of massive parallel and distributed systems such as MapReduce, Spark, and Hadoop. Their computation consists of a small number of rounds in which machines first process local data and then exchange information. After introducing the computational model in more detail, I will discuss how its complexity relates to the notion of locality for graph problems. On the one hand, some more standard local algorithms can be simulated efficiently, taking additionally advantage of the global data exchange. On the other hand, graph exploration seems to be hindered similarly to the streaming model. Many interesting questions remain open in this relatively young area of research.
Attendee info
Hospitality Notice for University and Government Employees: Microsoft Research is providing hospitality at this event. Please consult with your institution to determine whether you can accept meals and other hospitality under your institution’s ethics rules and any other laws that might apply.
Hotel Accommodations
Take advantage of the hotel room block for this event and book at the Hyatt Regency Cambridge by September 22.
Arrival Guidance
Friday, October 14:
Microsoft Research New England
1st floor, 1 Memorial Drive
Cambridge, MA 02142
Upon arrival, be prepared to show a picture ID and sign the Building Visitor Log when approaching the Lobby Floor Security Desk. Alert them to the name of the event you are attending and ask them to direct you to the appropriate floor. The talks will be held the first floor.
Saturday, October 15:
Massachusetts Institute of Technology
Kiva Seminar Room, 32G449, 32 Vassar Street
Cambridge, MA 02139
Enter Building 32 at the front entrance and proceed straight ahead; there will be elevators to the right. Take the elevators to the 4th floor; exit to the left and then turn right at the end of the elevator bank. At the end of the short corridor bear to the left and continue around the R&D Dining Room. CSAIL Headquarters will be to your left and the Patil/Kiva Seminar Room will be straight ahead.