May 15, 2014

Workshop on Algorithms and Data Science

Location: Cambridge UK

  • Dan Alistarh, Microsoft Research |Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free, guaranteeing that all operations always make progress. Unfortunately, designing wait-free algorithms is generally a very complex task, and the resulting algorithms are not always efficient. While obtaining efficient wait-free algorithms has been a long-time goal for the theory community, most non-blocking commercial code is only lock-free.

    Our work suggests a simple solution to this problem. We show that, for a large class of lock-free algorithms, under scheduling conditions which approximate those found in commercial hardware architectures, lock-free algorithms behave as if they are wait-free. In other words, under certain conditions, programmers can keep on designing simple lock-free algorithms, and in practice, they will get wait-free progress.

  • Graham Cormode, University of Warwick |When handling large quantities of data, it is desirable to outsource the computational effort to a third party: this captures current efforts in cloud computing, but also scenarios within trusted computing systems and inter-organizational data sharing. When the third party is not fully trusted, it is desirable to give assurance that the computation has been performed correctly. This poster presents some recent results in designing new protocols for verifying computations which are streaming in nature: the verifier (data owner) needs only a single pass over the input, storing a sublinear amount of information, and follows a simple protocol with a prover (service provider) that takes a small number of rounds. A dishonest prover fools the verifier with vanishingly small probability, while an honest prover’s answer is always accepted.

    Interactive proof techniques allow a quite general class of computations to be verified, leading to practical implementations.

  • Neil Dalchau, Microsoft Research |The ability to reliably re-program cells has great potential for tackling societal challenges, in areas such as health, food and energy. DNA is the code that stores the information required to replicate life, and is read by cellular machinery. In addition to its role in the natural sense, it is also possible to use DNA as a substrate for computing, by taking advantage of its physical properties. This gives rise to the potential use of DNA as an intracellular controller, improving cell-based production of biofuels and medicines, and even acting as “smart drugs”. This poster will show how we have used DNA strand displacement to create complex dynamical behaviours. Network designs benefit from the Visual DSD software, developed within Microsoft Research, to build, analyse and test models of DNA strand displacement networks. We formally implement chemical reaction networks (CRNs), which are capable of a rich set of nonlinear dynamics. By combining DNA implementations of non-catalytic and autocatalytic reactions, we construct a circuit that implements a well-studied approximate majority algorithm, one of the fastest known ways for a population to reach a collective majority decision between two possible outcomes. The experimentally observed dynamics of the full circuit are well predicted by a model whose parameters are inferred from experimental measurements of the constituent reactions using Markov chain Monte Carlo parameter estimation. The predictability of the dynamics of DNA-based computing architectures could one day lead to their use in a clinical context as sophisticated molecular controllers.

  • Fajwel Fogel, Ecole Normale Superieure |We study the problem of ranking a set of n items given pairwise comparisons.

    We show that the ranking problem is directly connected to another classical ordering problem, namely, seriation. In particular, we show how to construct a similarity matrix based on pairwise comparisons, and, in the noiseless case, recover the true ranking by applying the spectral seriation algorithm to the similarity matrix.

    In the noisy case, we show that spectral seriation can exactly recover the true ranking even when some of the pairwise comparisons are either corrupted or missing, provided that the pattern of errors is relatively unstructured.

    Finally, we perform numerical experiments which suggest our method also to be competitive in settings when the input data consists of a small fraction of all possible comparisons.

  • Christos Gkantsidis, Microsoft Research |Traditional systems for stream processing have focused on queries on high-rate data streams, typically by scaling-out the work. However, with the emergence of Personal Assistants such as Cortana, we are faced with a different type of queries, low-rate but computationally more expressive. In addition, the back-end to support those queries should be able to cope with potentially huge numbers of such small queries, many of which may be similar (e.g. many users may require weather updates for the same area). The placement of queries to servers in the back-end should maximize opportunities for sharing work among queries, and, at the same time, balance the load across servers. In this demo, we describe the algorithms we have designed to minimize network traffic and balance load for Reactor (the system that executes the queries created by Cortana).

  • Konstantinos Karanasos, Microsoft Research |In content-based publish-subscribe (pub/sub) systems, users express their interests as queries over a stream of publications. Scaling up content-based pub/sub to very large numbers of subscriptions is challenging: users are interested in low latency, that is, getting subscription results fast, while the pub/sub system provider is mostly interested in scaling, i.e., being able to serve large numbers of subscribers, with low computational resources utilization. In this poster I will present a novel approach for scalable content-based pub/sub in the presence of constraints on the available CPU and network resources, implemented within our pub/sub system Delta. We achieve scalability by off-loading some subscriptions from the pub/sub server, and leveraging view-based query rewriting to feed these subscriptions from the data accumulated in others. The main contribution is a novel algorithm for organizing views in a multi-level dissemination network, exploiting view-based rewriting and powerful linear programming capabilities to scale to many views, respect capacity constraints, and minimize latency. The efficiency and effectiveness of our algorithm are confirmed through extensive experiments and a large deployment in a WAN. This is a joint work with Asterios Katsifodimos and Ioana Manolescu.

  • Thomas Sauerwald, University of Cambridge |We consider the problem of balancing load items (tokens) on networks.

    Starting with an arbitrary load distribution, we allow in each round nodes to exchange tokens with their neighbours. The goal is to obtain a load distribution where all nodes have the same number of tokens. For the continuous case where tokens are arbitrarily divisible, most load balancing schemes correspond to Markov chains whose convergence is fairly well-understood.

    However, in many applications load items cannot be divided arbitrarily often and we need to deal with the discrete case where load is composed of indivisible tokens. We investigate a natural randomised protocol and demonstrate that there is almost no difference between the discrete and continuous case. Specifically we show that for any network all nodes have the same number of tokens up to an additive constant in the same number of rounds as in the continuous case.

  • Milan Vojnovic, Microsoft Research |Abstract TBD.