Machine Intelligence and Perception

The Machine Intelligence & Perception group comprises world-class scientists, engineers, entrepreneurs, and visionaries.

We share a dream of a brighter future: a future where machines understand and interact naturally with people, where computers enhance human creativity, where intelligent systems navigate and interact with complex environments, and where algorithms help us lead healthier lives.

We are driven by scientific curiosity, the thrill of discovery, the challenge of solving ambitious, real-world problems, and the daily opportunity to work with and learn from an amazing group of people. We join forces with our colleagues around the world: programming language gurus, mathematicians, physicists, biologists, social scientists, designers. We play a key role in the academic community, publishing prize-winning papers, supervising PhD students, and releasing code. And we offer unique opportunities to lead Microsoft forward and enable ground-breaking new products.

Research Themes

Deep Program Understanding

Deep Program Understanding illustrationIn the deep program understanding effort we aim to teach machines to understand complex algorithms, combining methods from the programming languages and machine learning communities. We hope that building algorithmic reasoning into AI systems will enable machines to understand highly structured data and natural processes, as well as empower developers with smart software engineering tools.

Multi-agent Learning

Minecraft scene with one human and one AI agentOur research on multi-agent learning aims to develop intelligent agents that can collaborate with people, in applications ranging from video games to assistive technology. As we endeavour to unravel the principles of multi-agent learning and collaboration, our research is facilitated by the Project Malmo, our open-source experimentation platform built on the game Minecraft.

Asynchronous Distributed Neural Network Training

Asynchronous neural network computation illustrationInspired by the recent development of specialized hardware for deep learning, we are studying new ways to train neural networks using large number of fast compute devices. We propose an asynchronous model parallel training algorithm to achieve high device utilization and fast convergence. The proposed algorithm can naturally handle varying computational workload and is therefore highly suitable for training neural network models that exhibit data-dependent dynamic computation flow (e.g., Tree RNN, Graph RNN, etc).

Stochastic Neural Networks

Multi-Level Variational AutoencoderIn the stochastic neural network project we aim to build the next generation of deep learning models which are more data-efficient and can enable machines to learn more efficiently and eventually to be truly creative.

TrueSkill Ranking System

TrueSkill skill level illustrationThe TrueSkill ranking system is a skill based ranking system for Xbox Live developed at Microsoft Research. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game.


infernet_smallInfer.NET Infer.NET is a framework for automatically applying probabilistic inference to a large variety of problems.  Infer.NET has required the development of new, modular machine learning algorithms, along with new ways of architecting inference software that can deliver efficient, customised code tailored to solve particular problems. This work has also led to developments in probabilistic programming.

Visually Grounded Natural Language

Natural language communication is enabled by a common understanding of the meanings of individual words, and the way these word are combined to create meaning from sentences, documents and dialogs.  The core question in natural language research is how these common understandings are learned and represented.  A leading hypothesis is that these they emerge from our interactions with a shared visual environment.

Our work on Visually Grounded Natural Language focuses on models which learn the meaning of words through their connection to both static and dynamic visual inputs.  We are starting with connecting individual sentences to static images, with plans to eventually build agents which can participate in complete dialogs grounded in fully interactive visual environments.

A Few Things We’re Proud Of

Technology Transfers


  • German Pattern Recognition Award 2016
  • ICCV 2015 Marr Prize (Best Paper)
  • CHI 2015 Honorable Mention
  • NIPS 2014 Outstanding Paper

Join us!  Do you love to turn mathematics into code?  Do you want to build the future?  Then apply here.