June 28, 2010 - July 2, 2010

Summer School 2010

Location: Cambridge, England, U.K.

  • Overview of Microsoft Research and the Summer School – Andrew Blake (Microsoft Research)

    How to write a great research paper – Simon Peyton-Jones (Microsoft Research)

  • Presentation skills – Ken Shaw (Benchmark Communication Techniques)

    Lecture, Presentation or Conversation? We will examine: Who your audience is; What they want; Why you are addressing them; How you handle practical issues like nerves, body language, speech & voice, humour, visual aids etc.; What is success? What is plan B if everything goes wrong; How you recover.

    Presentation content – Simon Peyton-Jones (Microsoft Research)

    Writing papers and giving talks are key skills for any researcher, but they aren’t easy. In this pair of presentations, I’ll describe simple guidelines that I follow for writing papers and giving talks, which I think may be useful to you too. I don’t have all the answers – far from it – and I hope that the presentation will evolve into a discussion in which you share your own insights, rather than a lecture.

    Rough guide to being an entrepreneur – Jack Lang (University of Cambridge)

    At some stage you might want to exploit your ideas by starting a company, just as Bill Gates and Paul Allen did in 1975. It might even be the next Microsoft, or bought by them. I’ll give an overview of the process, explain some of the success factors investors look for, and how to go about writing a business plan and getting off the ground.

    Parallel session

    Simulating global carbon-climate feedback – Drew Purves (Microsoft Research)

    We will provide an overview of the global carbon cycle and its potential roles in accelerating or mitigating historic and future climate change—and introduce a carbon-climate modeling system developed recently at Microsoft Research Cambridge. We will begin with a global summary of current natural and anthropogenic sources and sinks of carbon, explaining the methods scientists have used to estimate these numbers. Next, we will consider the scientific challenge of predicting the future of these sources and sinks, both in terms of the biological and socioeconomic processes that must be considered and in terms of the model structures, data sources, statistical machinery, and computational power—and, therefore, the novel software tools—needed to make the predictions more reliable. Key issues will be illustrated—and new predictions about the carbon cycle made—using the new carbon-climate modeling system, one instantiation of which can be seen in a related TechFest demo, Understanding and Preserving Life-Support Systems.

    Molecular programming – Luca Cardelli (Microsoft Research)

    Moore’s law is pushing technology towards smaller and smaller devices, and quite soon we will reach the ultimate goal: devices made of single molecules. At that point we will need to engineer systems one molecule at a time, using tools of comparable accuracy. Molecular engineering is already in full swing, but that usually means building ad-hoc molecular devices that lack programmability.Nucleic acids (DNA/RNA) encode information digitally, and are currently the only truly ‘user-programmable’ entities at the molecular scale. The fact that they have biological origin is incidental: they are just very handy engineering materials. DNA/RNA can be used to manufacture nano-scale and meso-scale structures, produce physical forces, act as sensors and actuators, and also to compute. They are unique in that they are both materials and carriers of information: they are programmable matter. Moreover they can interface to biological entities, with enormous medical implications: we will be able to detect and cure diseases at the cellular level under program control.The (bio-)technology to create and manipulate nucleic acids has existed for many years, but the imagination necessary to exploit them programmatically has been evolving slowly. Recently, some simple computational schemes have been developed that are autonomous (run completely on their own once initialized) and involve only short, easily synthesizable, DNA strands with no other complex molecules. Since DNA computation is massively concurrent, some tricky and yet familiar issues arise: the need to formally analyze and verify molecular programs to avoid subtle deadlocks and race conditions, and the need to design high-level languages and compilers that exploit concurrency and stochasticity.Molecular programming is the emerging discipline of designing and constructing molecular system that behave algorithmically, in carrying out computation, in forming physical and dynamic structures, or both.

    Parallel session

    Infer.NET and probabilistic programming – John Winn (Microsoft Research)

    Would you like to write software that can adapt to new situations, learn from examples or work with uncertain information? Infer.NET is a machine learning framework that lets you build such capabilities easily using a new way of programming called probabilistic programming. Probabilistic programs can work with uncertain or unknown variables and even uncertain execution. By using such programs, you can combine detailed domain knowledge with the latest machine learning algorithms to generate tailored code to solve your problem. I’ll explain what probabilistic programming is and give some example of using it for search and for online gaming.

    Ten things you don’t know about Microsoft – Derick Campbell (Microsoft Research)

    Think you know everything about Microsoft? Join the fun and learn several tips and tricks about Microsoft software that can help scientists in their research, cool new technologies, and insight into Microsoft culture and history.

  • From data to knowledge – Sydney Brenner (Salk Institute)

    This is the great challenge for biology today. It is also the great challenge for computer science; bioinformatics is not enough and computational biology is still in its infancy. We have to have a theory of the computational architecture of biological systems, of how outcomes are generated in the hardware of living systems. Computational biology must reflect biological computation. I will discuss this in my talk and give some examples as to how this can inform knowledge systems for biology.

    Parallel session

    Fun with F#: Solving complex problems with simple code – Anton Schwaighofer (Microsoft)

    I’ll give an introduction to F#, a functional programming language that is very well suited to express complex ideas and problems. I’ll start with the key ideas of functional programming, and contrast simple examples with traditional programming languages such as Matlab or C. In addition, I will present two particular features of F# that make it very well suited for data intensive scientific computing: It is very easy to do parallel programming; and units of measure provide valuable sanity checks for mathematical expressions.

    Simulation and data analysis with windows azure – Austin Donnelly (Microsoft Research)

    Research is being increasingly driven by the generation and analysis of datasets. Traditionally, datasets were manually entered into computers, but today’s datasets can be huge because they are computer generated: either the results of simulations or automated observations of the real world. In this talk I’ll describe Windows Azure, an environment for running your code on servers located around the world in Microsoft’s datacenters. This allows you to run large-scale simulations, and analyse the results before your paper deadline!

    From driving to trafficking: the developing view of the user in computer systems design – Richard Harper (Microsoft Research)

    In this talk I shall introduce the role of the user in computer systems design, explore how understanding of who and what the user is has developed over the years with examples, and report on how current research at Microsoft is designing for forms of user behaviours that were not imagined just a few years ago.Presentations of past students:

    From program analysis research to industrial programming language development– Andy Maule (PhD at University College London; now at Microsoft Corp)

    During my PhD I was offered the chance to be an intern with the Oslo product team at Microsoft in Redmond. As a research student in program analysis with some industrial experience, I was in the privileged position of being able to work on the design and implementation of a new programming language called M. After completing my PhD I accepted a job offer to continue working with the Oslo team on the M programming language. In this talk I will describe my PhD work, how it led me to my current job and what it’s like as a former research student in industry.

    Static contract checking for Haskell – Dana N. Xu (PhD at the University of Cambridge; now at INRIA)

    Program errors are hard to detect and are costly both to programmers who spend significant efforts in debugging, and for systems that are guarded by runtime checks. Static verification techniques have been applied to imperative and object-oriented languages, like Java and C#, but few have been applied to a higher-order lazy functional language, like Haskell. In this talk, I will describe a sound and automatic static verification framework for Haskell, that is based on contracts and symbolic execution. Our approach is modular and gives precise blame assignments at compile-time in the presence of higher-order functions and laziness.

    Communications, travel and social networks – Lynne Hamill (University of Surrey)

    The title of my PhD thesis was Communications, Travel and Social Networks since 1840: A Study Using Agent-based Models. The basic idea underlying the thesis was that the more we communicate, the more we travel. Agent-based modelling is a new method of social simulation. In this talk, I will explain what I did, the key results, and what I plan next. I will then offer some suggestions about successfully completing a PhD.

    Presentations of past students

    Tracking and localisation for speech and robotics – Maurice Fallon (PhD at the University of Cambridge; now at MIT)

    The Speech Source Tracking (my PhD work) and Robotic Localisation and Mapping problems (my PostDoc work) have been characterized using a very similar probabilistic framework. Nonetheless the techniques used to solve the problems are strikingly different – as a result of the vastly different input sensor data. The first part of talk concerns speaker localization, detection and tracking using noisy recordings from low cost microphones. The approach taken utilizes Particle Filtering techniques drawn from the target tracking community. Meanwhile, robotic mapping is characterised by high-rate, high-quality ranging sensors on mobile autonomous platforms – in this case a robotic kayak where optimization and adjustment are more pressing issues. Results are presented for the vehicle exploring in a uncertain natural environment.

    Acquiring syntactic and semantic transformations in question answering – Michael Kaisser (PhD at the University of Edinburgh; now at Microsoft Bing STC Europe)

    One and the same fact in natural language can be expressed in many different ways by using different words and/or a different syntax. This phenomenon, commonly called paraphrasing, is the main reason why Natural Language Processing (NLP) is such a challenging task. This becomes especially obvious in Question Answering (QA) where the task is to automatically answer a question posed in natural language, usually in a text collection also consisting of natural language texts. It cannot be assumed that an answer sentence to a question uses the same words as the question and that these words are combined in the same way by using the same syntactic rules.In my thesis I describe methods that can help to address this problem. Firstly I explore how lexical resources, i.e. FrameNet, PropBank and VerbNet can be used to recognize a wide range of syntactic realizations that an answer sentence to a givenquestion can have. Furthermore, I use a corpus of question and answer sentence pairs (QASPs) to develop an approach to QA based on matching dependency relations between answer candidates and question constituents in the answer sentences. In this talk, I will describe these two approaches in more detail and present evaluation results.

    To infinity and beyond with nonparametric Bayesian methods – Jurgen Van Gael (University of Cambridge)

    Probabilistic models in machine learning are widely used in science and industry. Traditionally, these models have been set up assuming a small set of unknowns which need to be learned from data. As the amount of data we learn from grows, more data will just lead to a few extra digits accuracy in our estimates. Nonparametric Bayesian methods are a family of techniques to make better use of data by allowing models to have an infinite number of parameters and letting the data decide how many to actually learn. In this talk I will illustrate how these type of techniques can be used to build a part of speech tagger without knowing anything about parts of speech!

  • Cloud computing for research – Fabrizio Gagliardi (Microsoft Research)

    In this talk I will give a general overview of Cloud Computing and show how it is revolutionising the way we can do scientific research. As an example I will present Venus-C, a European project developing a Cloud Computing service for the research community.

    Parallel sessions

    Introduction to intellectual property – Carole Boelitz (Microsoft)

    I will present the different types of intellectual property and how those rights can be obtained. I will also discuss some of the factors Microsoft uses for determining whether potential intellectual property rights are worth protecting and when we may prefer instead to share our work openly. Finally, I will talk about some of the more common issues we encounter when collaborating with other people or using materials created outside the company.

    Poster presentations for non native English speakers – Sue Duraikan (Duraikan Training)

    Presenting a poster at a conference is a terrific opportunity to promote your research and raise your professional profile in the wider academic community. However it can be daunting to compete with other presenters to get the attention of a passing audience. As well as a clear and captivating poster, you need the ability to build rapport quickly and present your subject positively and succinctly.This can be especially challenging when you are presenting in a language which is not your mother tongue.During the 3 poster sessions, Sue will be hovering in the room, watching and listening to your approach. She will then prepare to highlight on the final day the key thought-processes as well as the verbal and non-verbal skills you need to give a powerful poster presentation.