Microsoft Research Podcast | February 19, 2020
The Swiss Joint Research Center, Swiss JRC, is a collaborative research engagement between Microsoft Research and the two universities that make up the Swiss Federal Institutes of Technology: ETH Zurich and EPFL in Lausanne. Since its inception in 2008, the Swiss JRC has supported dozens of collaborative research projects between Microsoft and the two academic partners. Currently, there are fourteen research projects in cutting-edge research areas including artificial intelligence (including computer vision and mixed reality), cloud and quantum computing, systems, security, and human computer interaction.
The Swiss JRC has close links with the Mixed Reality & AI Zurich Lab which uses the Swiss JRC to pursue collaborative research engagements with the two universities.
The Swiss JRC is governed by a Steering Committee consisting of representatives from ETH Zurich, EPFL, Microsoft Switzerland, and Microsoft Research that ensures the smooth operation of the engagement for all three partners and decides on strategic direction. Among other things, the Steering Committee oversees the regular calls for proposal processes, invites to annual workshops, and manages the relationship with other relevant stakeholders in the geography.
EPFL projects (2019-2021)
PhD Student: Mengshi Qi
In recent years, there has been tremendous progress in camera-based 6D object pose, hand pose and human 3D pose estimation. They can now both be done in real time but not yet to the level of accuracy required to properly capture how people interact with each other and with objects, which is a crucial component of modeling the world in which we live. For example, when someone grasps an object, types on a keyboard, or shakes someone else’s hand, the position of their fingers with respect to what they are interacting with must be precisely recovered for the resulting models to be used by AR devices, such as the HoloLens device or consumer-level video see-through AR ones. This remains a challenge, especially given the fact that hands are often severely occluded in the egocentric views that are the norm in AR. We will, therefore, work on accurately capturing the interaction between hands and objects they touch and manipulate. At the heart of it, will be the precise modeling of contact points and the resulting physical forces between interacting hands and objects. This is essential for two reasons. First, objects in contact exert forces on each other; their pose and motion can only be accurately captured and understood if reaction forces at contact points and areas are modeled jointly. Second, touch and touch-force devices, such as keyboards and touch-screens are the most common human-computer interfaces, and by sensing contact and contact forces purely visually, every-day objects could be turned into tangible interfaces, that react as if they were equipped with touch-sensitive electronics. For instance, a soft cushion could become a non-intrusive input device that, unlike virtual mid-air menus, provides natural force feedback. In this talk, I will present some of our preliminary results and discuss our research agenda for the year to come.
PhD Student: Kristina Gligoric
The overall goal of this project is to develop methods for monitoring, modeling, and modifying dietary habits and nutrition based on large-scale digital traces. We will leverage data from both EPFL and Microsoft, to shed light on dietary habits from different angles and at different scales: Our team has access to logs of food purchases made on the EPFL campus with the badges carried by all EPFL members. Via the Microsoft collaborators involved, we have access to Web usage logs from IE/Edge and Bing, and via MSR’s subscription to the Twitter firehose, we gain full access to a major social media platform. Our agenda broadly decomposes into three sets of research questions: (1) Monitoring and modeling: How to mine digital traces for spatiotemporal variation of dietary habits? What nutritional patterns emerge? And how do they relate to, and expand, the current state of research in nutrition? (2) Quantifying and correcting biases: The log data does not directly capture food consumption, but provides indirect proxies; these are likely to be affected by data biases, and correcting for those biases will be an integral part of this project. (3) Modifying dietary habits: Our lab is co-organizing an annual EPFL-wide event called the Act4Change challenge, whose goal is to foster healthy and sustainable habits on the EPFL campus. Our close involvement with Act4Change will allow us to validate our methods and findings on the ground via surveys and A/B tests. Applications of our work will include new methods for conducting population nutrition monitoring, recommending better-personalized eating practices, optimizing food offerings, and minimizing food waste.
EPFL PI: Tobias J. Kippenberg
Microsoft PI: Hitesh Ballani
PhD Student: Arslan Raja
The substantial increase in optical data transmission, and cloud computing, has fueled research into new technologies that can increase communication capacity. Optical communication through fiber, which traditionally has been used for long haul fiber optical communication, is now also employed for short haul communication, even with data-centers. In a similar vein, the increasing capacity crunch in optical fibers, driven in particular by video streaming, can only be met by two degrees of freedom: spatial and wavelength division multiplexing. Spatial multiplexing refers to the use of optical fibers that have multiple cores, allowing to transmit the same carrier wavelength in multiple fibers. Wavelength division multiplexing (WDM or dense-DWM) refers to the use of multiple optical carriers on the same fiber. A key advantage of WDM is the ability to increase line-rates on existing legacy network, without requirements to change existing SMF28 single mode fibers. WDM is also expected to be employed in data-centers. Yet to date, WDM implementation within datacenters faces a key challenge: a CMOS compatible, power efficient source of multi-wavelengths. Currently employed existing solutions, such as multi-laser chips based on InP (as developed by Infinera) cannot be readily scaled to a larger number of carriers. As a result, the currently prevalently employed solution is to use a bank of multiple, individual laser modules. This approach is not viable for datacenters due to space and power constraints. Over the past years, a new technology has rapidly matured – that was developed by EPFL – microresonator frequency combs, or microcombs that satisfy these requirements. The potential of this new technology in telecommunications has recently been demonstrated with the use of microcombs for massively coherent parallel communication on the receiver and transmitter side. Yet to date the use of such micro-combs in data-centers has not been addressed.
- Kippenberg, T. J., Gaeta, A. L., Lipson, M. & Gorodetsky, M. L. Dissipative Kerr solitons in optical microresonators. Science 361, eaan8083 (2018).
- Brasch, V. et al. Photonic chip–based optical frequency comb using soliton Cherenkov radiation. Science aad4811 (2015). doi:10.1126/science.aad4811
- Marin-Palomo, P. et al. Microresonator-based solitons for massively parallel coherent optical communications. Nature 546, 274–279 (2017).
- Trocha, P. et al. Ultrafast optical ranging using microresonator soliton frequency combs. Science 359, 887–891 (2018).
EPFL PIs: Edouard Bugnion
PhD Student: Konstantinos Prasopoulos
The deployment of a web-scale application within a datacenter can comprise of hundreds of software components, deployed on thousands of servers organized in multiple tiers and interconnected by commodity Ethernet switches. These versatile components communicate with each other via Remote Procedure Calls (RPCs) with the cost of an individual RPC service typically measured in microseconds. The end-user performance, availability and overall efficiency of the entire system are largely dependent on the efficient delivery and scheduling of these RPCs. Yet, these RPCs are ubiquitously deployed today on top of general-purpose transport protocols such as TCP. We propose to make RPC first-class citizens of datacenter deployment. This requires a revisitation of the overall architecture, application API, and network protocols. Our research direction is based on a novel RPC-oriented protocol, R2P2, which separates control flow from data flow and provides in-networking scheduling opportunities to tame tail latency. We are also building the tools that are necessary to scientifically evaluate microsesecond-scale services.
ETH Zurich projects (2019-2021)
PhD Student: Lukas Schmid
AR/VR allow new and innovative ways of visualizing information and provide a very intuitive interface for interaction. At their core, they rely only on a camera and inertial measurement unit (IMU) setup or a stereo-vision setup to provide the necessary data, either of which are readily available on most commercial mobile devices. Early adoptions of this technology have already been deployed in the real estate business, sports, gaming, retail, tourism, transportation and many other fields. The current technologies in visual-aided motion estimation and mapping on mobile devices have three main requirements to produce highly accurate 3D metric reconstructions: An accurate spatial and temporal calibration of the sensor suite, a procedure which is typically carried out with the help of external infrastructure, like calibration markers, and by following a set of predefined movements. Well-lit, textured environments and feature-rich, smooth trajectories. The continuous and reliable operation of all sensors involved. This project aims at relaxing these requirements, to enable continuous and robust lifelong mapping on end-user mobile devices. Thus, the specific objectives of this work are: 1. Formalize a modular and adaptable multi-modal sensor fusion framework for online map generation; 2. Improve the robustness of mapping and motion estimation by exploiting high-level semantic features; 3. Develop techniques for automatic detection and execution of sensor calibration in the wild. A modular SLAM (simultaneous localization and mapping) pipeline which is able to exploit all available sensing modalities can overcome the individual limitations of each sensor and increase the overall robustness of the estimation. Such an information-rich map representation allows us to leverage recent advances in semantic scene understanding, providing an abstraction from low-level geometric features – which are fragile to noise, sensing conditions and small changes in the environment – to higher-level semantic features that are robust against these effects. Using this complete map representation, we will explore new ways to detect miscalibrations and sensor failures, so that the SLAM process can be adapted online without the need for explicit user intervention.
ETH Zurich PI: Ce Zhang
Microsoft PI: Matteo Interlandi
PhD Student: Bojan Karlaš
The goal of this project is to mine ML.NET historical data such as user telemetry and logs to understand how ML.NET transformations and learners are used and eventually being able to use this knowledge to automatically provide suggestions to data scientists using ML.NET.Suggestions can be in the form of: Better or additional recipes for unexplored tasks (e.g., neural networks). Auto-completion suggestions for pipelines directly authored for example in .NET or Python.Automatically generation of parameters and sweep strategies optimal for the task at hand. We will try to develop a solution that is extensible such that, if new tasks, algorithms, etc. are added to the library, suggestions will be eventually properly upgraded as well. Additionally, the tool will have to interface with ML.NET and make easy to add new recipes coming either from users or the log mining tool.
ETH Zurich PI: Siyu Tang
PhD Student: Siwei Zhang
Humans are social beings and frequently interacting with one another, e.g. spending a large amount of their time being socially engaged, working in teams, or just being as part of the crowd. Understanding human interaction from visual input is an important aspect of visual cognition and key to many applications including assistive robotics, human-computer interaction and AR/VR. Despite rapid progresses in estimating 3D pose and shape of a single person from RGB images, capturing and modelling human interactions is rather poorly studied in the literature. Particularly for the first-person-view settings, the problem has drawn little attention from the computer vision community. We argue that it is essential for the augmented reality glasses, e.g. Microsoft HoloLens, to capture and model the interactions between the camera wearer and others as the interaction between humans characterises how they move, behave and perform tasks in a collaborative setting.
In this project, we aim to understand how to recognise and predict the interactions between humans under the first-person view setting. To that end, we will create a 3D human-human interaction dataset where the goal is to capture rich and complex interaction signals including body and hand poses, facial expression and gaze directions using Microsoft Kinect and HoloLens. We will develop models that can recognise the dynamics of human interactions and even predict the motion and activities of the interacting humans. We believe such models will facilitate various down-streaming applications for the augmented reality glasses, e.g. Microsoft HoloLens.
PhD Student: Florian Achermann
A major factor restricting the utility of UAVs is the amount of energy aboard, which limits the duration of their flights. Birds face largely the same problem, but they are adept at using their vision to aid in spotting — and exploiting — opportunities for extracting extra energy from the air around them. Project Altair aims at developing infrared (IR) sensing techniques for detecting, mapping and exploiting naturally occurring atmospheric phenomena called thermals for extending the flight endurance of fixed-wing UAVs. In this presentation, we will introduce our vision and goals for this project.
PhD Student: Niels Gleinig
QIRO will establish a new internal representation for compilation systems on quantum computers. Since quantum computation is still emerging, I will provide an introduction to the general concepts of quantum computation and a brief discussion of its strengths and weaknesses from a high-performance computing perspective. This talk is tailored for a computer science audience with basic (popular-science) or no background in quantum mechanics and will focus on the computational aspects. I will also discuss systems aspects of quantum computers and how to map quantum algorithms to their high-level architecture. I will close with the principles of practical implementation of quantum computers and outline the project.
ETH Zurich PI: Andreas Krause
Microsoft PI: Katja Hofmann
PhD Student: David Lindner
Reinforcement learning (RL) is a promising paradigm in machine learning and gained considerable attention in recent years, partly because of its successful application in previously unsolved challenging games like Go and Atari. While these are impressive results, applying reinforcement learning in most other domains, e.g. virtual personal assistants, self-driving cars or robotics, remains challenging. One key reason for this is the difficulty of specifying the reward function a reinforcement learning agent is intended to optimize. For instance, in a virtual personal assistant, the reward function might correspond to the user’s satisfaction with the assistant’s behavior and is difficult to specify as a function of observations (e.g. sensory information) available to the system. In such applications, an alternative to specifying the reward function is to actually query the user for the reward. This, however, is only feasible if the number of queries to the user are limited and the user’s response can be provided in a natural way such that the system’s queries are non-irritating. Similar problems arise in other application domains such as robotics in which, for instance, the true reward can only be obtained by actually deploying the robot but an approximation to the reward can be computed by a simulator. In this case, it is important to optimize the agent’s behavior while simultaneously minimizing the number of costly deployments. This project’s aim is to develop algorithms for these types of problems via scalable active reward learning for reinforcement learning. The project’s focus is on scalability in terms of computational complexity (to scale to large real-world problems) and sample complexity (to minimize the number of costly queries).
PhD Students: Simon Zimmermann
With this project, we aim to accelerate the development of intelligent robots that can assist those in need with a variety of everyday tasks. People suffering from physical impairments, for example, often need help dressing or brushing their own hair. Skilled robotic assistants would allow these persons to live an independent lifestyle. Even such seemingly simple tasks, however, require complex manipulation of physical objects, advanced motion planning capabilities, as well as close interactions with human subjects. We believe the key to robots being able to undertake such societally important functions is learning from demonstration. The fundamental research question is, therefore, how can we enable human operators to seamlessly teach a robot how to perform complex tasks? The answer, we argue, lies in immersive telemanipulation. More specifically, we are inspired by the vision of James Cameron’s Avatar, where humans are endowed with alternative embodiments. In such a setting, the human’s intent must be seamlessly mapped to the motions of a robot as the human operator becomes completely immersed in the environment the robot operates in. To achieve this ambitious vision, many technologies must come together: mixed reality as the medium for robot-human communication, perception and action recognition to detect the intent of both the human operator and the human patient, motion retargeting techniques to map the actions of the human to the robot’s motions, and physics-based models to enable the robot to predict and understand the implications of its actions.
ETH Zurich PI: Christian Holz
Microsoft PI: Ken Hinckley
PhD Student: Hugo Romat
Over the past dozen years, touch input – seemingly well-understood – has become the predominate means of interacting with devices such as smartphones, tablets, and large displays. Yet we argue that much remains unknown – in the form of a seen but unnoticed vocabulary of natural touch – that suggests tremendous untapped potential. For example, touchscreens remain largely ignorant of the human activity, manual behavior, and context-of-use beyond the moment of finger-contact with the screen itself. In a sense, status quo interactions are trapped in a flatland of touch, while systems remain oblivious to the vibrant world of human behavior, activity, and movement that surrounds them.We posit that an entire vocabulary of naturally-occurring gestures – both in terms of the activity of the hands, as well as the subtle corresponding motion and compensatory movements of the devices themselves – exists in plain sight.Our intended outcome is creating a conceptual understanding as well as a deployable interactive system, both of which blend the naturally-occurring gestures – interactions users embody through their actions – with the explicit input through traditional touch operation.
ETH Zurich PI: Onur Mutlu
PhD Student: Lois Orosa
This project examines the architecture and management of next-generation data center storage devices within the context of realistic data-intensive workloads. The aim is to investigate novel techniques that can greatly improve performance, cost, and efficiency in real world systems with real world applications, breaking the barriers between the applications and devices, such that the software can much more effectively and efficiently manage the underlying storage devices that consist of (potentially different types of) flash memory, emerging SCM (storage class memory) technologies, and (potentially different types of) DRAM memories. We realize that there is a disconnect in the communication between applications/software and the NVM devices: the interfaces and designs we currently have enable little communication of useful information from the application/software level (including the kernel) to the NVM devices, and vice versa, causing significant performance and efficiency loss and likely fueling higher “managed” storage device costs because applications cannot even communicate their requirements to the devices. We aim to fundamentally examine the software-NVM interfaces as well as designs for the underlying storage devices to minimize the disconnect in communication and empower applications and system software to more effectively manage the underlying devices, optimizing important system-level metrics that are of interest to the system designer or the application (at different points in time of execution).
EPFL projects (2017-2018)
EPFL PIs: Babak Falsafi, Martin Jaggi
Microsoft Co-PI: Eric Chung
Deep Neural Networks (DNNs) have emerged as algorithms of choice for many prominent machine learning tasks, including image analysis and speech recognition. In datacenters, DNNs are trained on massive datasets to improve prediction accuracy. While the computational demands for performing online inference in an already trained DNN can be furnished by commodity servers, training DNNs often requires computational density that is orders of magnitude higher than that provided by modern servers. As such, operators often use dedicated clusters of GPUs for training DNNs. Unfortunately, dedicated GPU clusters introduce significant additional acquisition costs, break the continuity and homogeneity of datacenters, and are inherently not scalable. FPGAs are appearing in server nodes either as daughter cards (e.g., Catapult) or coherent sockets (e.g., Intel HARP) providing a great opportunity to co-locate inference and training on the same platform. While these designs enable natural continuity for platforms, co-locating inference and training on a single node faces a number of key challenges. First, FPGAs inherently suffer from low computational density. Second, conventional training algorithms do not scale due to inherent high communication requirements. Finally, co-location may lead to contention requiring mechanisms to prioritize inference over training. In this project, we will address these fundamental challenges in DNN inference/training co-location on servers with integrated FPGAs. Our goals are:
- Redesign training and inference algorithms to take advantage of DNNs inherent tolerance for low precision operations.
- Identify good candidates for hard-logic blocks for the next generations of FPGAs.
- Redesign DNN training algorithms to aggressively approximate and compress intermediate results, to target communication bottlenecks and scale the training of single networks to an arbitrary number of nodes.
- Implement FPGA-based load balancing techniques in order to provide latency guarantees for inference tasks under heavy loads and enable the use of idle accelerator cycles to train networks when operating under lower loads.
EPFL PIs: Michael Kapralov, Ola Svensson
The task of grouping data according to similarity is a basic computational task with numerous applications. The right notion of similarity often depends on the application and different measures yield different algorithmic problems. The goal of this project is to design faster and more accurate algorithms for fundamental clustering problems such as the k-means problem, correlation clustering and hierarchical clustering. We propose to perform a fine grained study of these problems and design algorithms that achieve optimal trade-offs between approximation quality, runtime and space/communication complexity, making our algorithms well-suited for modern data models such as streaming and MapReduce.
Several companies are now launching drones that autonomously follow and film their owners, often by tracking a GPS device they are carrying. This holds the promise to fundamentally change the way in which drones are used by allowing them to bring back videos of their owners performing activities, such as playing sports, unimpeded by the need to control the drone. In this project, we propose to go one step further and turn the drone into a personal trainer that will not only film but also analyse the video sequences and provide advice on how to improve performance. For example, a golfer could be followed by such a drone that will detect when he swings and offer advice on how to improve the motion. Similarly, a skier coming down a slope could be given advice on how to better turn and carve. In short, the drone would replace the GoPro-style action cameras that many people now carry when exercising. Instead of recording what they see, it would film them and augment the resulting sequences with useful advice. To make this solution as lightweight as possible, we will strive to achieve this goal using the on-board camera as the sole sensor and free the user from the need to carry a special device that the drone locks onto. This will require:
- Detecting the subject in the video sequences acquired by the drone so as to keep him in the middle of its field of view. This must be done in real-time and integrated into the drone’s control system.
- Recovering the subject’s 3D pose as he moves from the drone’s videos. This can be done with a slight delay since the critique only has to be provided once the motion has been performed.
- Providing feedback. In both the golf and ski cases, this would mean quantifying leg, hips, shoulders, and head position during a swing or a turn, offering practical suggestions on how to change them, and showing how an expert would have performed the same action.
EPFL PI: Babak Falsafi
Microsoft Co-PI: Stavros Volos
Near-memory processing (NMP) is a promising approach to satisfy the performance requirements of modern datacenter services at a fraction of modern infrastructure’s power. NMP leverages emerging die-stacked DRAM technology, which (a) delivers high-bandwidth memory access, and (b) features a logic die, which provides the opportunity for dramatic data movement reduction – and consequently energy savings – by pushing computation closer to the data. In the precursor to this project (the MSR European PhD Scholarship), we evaluated algorithms suitable for database join operators near memory. We showed, while sort join has been conventionally thought of as inferior to hash join in performance for CPUs, near-memory processing favors sequential over random memory access, making sort join superior in performance and efficiency as a near-memory service. In this project, we propose to answer the following questions:
- What data-specific functionality should be implemented near memory (e.g., data filtering, data reorganization, data fetch)?
- What ubiquitous, yet simple system-level functionality should be implemented near memory (e.g., security, compression, remote memory access)?
- How should the services be integrated with the system (e.g., how does the software use them)?
- How do we employ near-threshold logic in near-memory processing?
EPFL PIs: Rachid Guerraoui, Georgios Chatzopoulos
Microsoft Co-PI: Aleksandar Dragojevic
Modern hardware trends have changed the way we build systems and applications. Increasing memory (DRAM) capacities at reduced prices make keeping all data in-memory cost-effective, presenting opportunities for high performance applications such as in-memory graphs with billions of edges (e.g. Facebook’s TAO). Non-Volatile RAM (NVRAM) promises durability in the presence of failures, without the high price of disk accesses. Yet, even with this increase in inexpensive memory, storing the data in the memory of one machine is still not possible for applications that operate on TB of data, and systems need to distribute the data and synchronize accesses among machines. This project proposes the design and building of support for high-level transactions on top of modern hardware platforms, using the Structured Query Language (SQL). The important question to be answered is whether transactions can get the maximum benefit of these modern networking and hardware capabilities, while offering a significantly easier interface for developers to work with. This project will require both research in the transactional support to be offered, including the operations that can be efficiently supported, as well as research in the execution plans for transactions in this distributed setting.
EPFL PI: Florin Dinu
The goal of our project is to improve the utilization of server resources in data centers. Our proposed approach was to attain a better understanding of the resource requirements of data-parallel applications and then incorporate this understanding into the design of more informed and efficient data center (cluster) schedulers. While pursuing these directions we have identified two related challenges that we believe hold the key towards significant additional improvements in application performance as well as cluster-wide resource utilization. We will explore these two challenges as a continuation of our project. These two challenges are: Resource inter-dependency and time-varying resource requirements. Resource inter-dependency refers to the impact that a change in the allocation of one server resource (memory, CPU, network bandwidth, disk bandwidth) to an application has on that application’s need for the other resources. Time-varying resource requirements refers to the fact that over the lifetime of an application its resource requirements may vary. Studying these two challenges together holds the potential for improving resource utilization by aggressively but safely collocating applications on servers.
ETH Zurich projects (2017-2018)
ETH Zurich PI: Gustavo Alonso
Microsoft Co-PI: Ken Eguro
While in the first phase of the project we explored the efficient implementation of data processing operators in FPGAs as well as the architectural issues involved in the integration of FPGAs as co-processors in commodity servers, in this new proposal we intend to focus on architectural aspects of in-network data processing. The choice is motivated by the growing gap between the bandwidth and very low latencies that modern networks support and the overhead of ingress and egress from VMs and applications running on conventional CPUs. A first goal is to explore the type of problems and algorithms that can be best run as the data flows through the network so as to be able to exploit the bare wire speed and allow off-loading of expensive computations to the FPGA. A second, but not less important goal, is to explore how to best operate FPGA based accelerators when directly connected to the network and operating independently from the software part of the application. In terms of applications, the focus will remain on data processing (relational, No-SQL, data warehouses, etc.) with the intention of starting to move towards machine learning algorithms at the end of the two-year project. On the network side, the project will work on developing networking protocols suitable to this new configuration and how to combine the network stack with the data processing stack.
ETH Zurich PIs: Onur Mutlu, Luca Benini
Microsoft Co-PI: Derek Chiou
Today’s systems are overwhelmingly designed to move data to computation. This design choice goes directly against key trends in systems and technology that cause performance, scalability and energy bottlenecks:
- data access from memory is a key bottleneck as applications become more data-intensive and memory bandwidth and energy do not scale well,
- energy consumption is a key constraint in especially mobile and server systems,
- data movement is very costly in terms of bandwidth, energy and latency, much more so than computation.
Our goal is to comprehensively examine the premise of adaptively performing computation near where the data resides, when it makes sense to do so, in an implementable manner and considering multiple new memory technologies, including 3D-stacked memory and non-volatile memory (NVM). We will examine practical hardware substrates and software interfaces to accelerate key computational primitives of modern data-intensive applications in memory, runtime and software techniques that can take advantage of such substrates and interfaces. Our special focus will be on key data-intensive applications, including deep learning, neural networks, graph processing, bioinformatics (DNA sequence analysis and assembly), and in-memory data stores. Our approach is software/hardware cooperative, breaking the barriers between the two and melding applications, systems and hardware substrates for extremely efficient execution, while still providing efficient interfaces to the software programmer.
ETH Zurich PI: Otmar Hilliges
Microsoft Co-PI: Marc Pollefeys
Micro-aerial vehicles (MAVs) have been made accessible to end-users via the emergence of simple to use hardware and programmable software platforms and have seen a surge in consumer and research interest as a consequence. Clearly there is a desire to use such platforms in a variety of application scenarios but manually flying quadcopters remains a surprisingly hard task even for expert users. More importantly, state-of-the-art technologies offer only very limited support for users who want to employ MAVs to reach a certain high-level goal. This is maybe best illustrated by the currently most successful application area – that of aerial videography. While manual flight is hard, piloting and controlling a camera simultaneously is practically impossible. An alternative to manual control is offered via waypoint based control of MAVs, shielding novices from the underlying complexities. However, this simplicity comes at the cost of flexibility and existing flight planning tools are not designed with high-level user goals in mind. Building on our own (MSR JRC funded) prior work, we propose an alternative approach to robotic motion planning. The key idea is to let the user work in solution-space – instead of defining trajectories the user would define what the resulting output should be (e.g., shot composition, transitions, area to reconstruct). We propose an optimization-based approach that takes such high-level goals as input and generates the trajectories and control inputs for a gimbal mounted camera automatically. We call this solution-space driven, inverse kinematic motion planning. Defining the problem directly in the solution space removes several layers of indirection and allows users to operate in a more natural way, focusing only on the application specific goals and the quality of the final result, whereas the control aspects are entirely hidden.
ETH Zurich PIs: Thomas Hofmann, Aurélien Lucchi
Microsoft Co-PI: Sebastian Nowozin
The past decade has seen a growth in application of big data and machine learning systems. Probabilistic models of data are theoretically well understood and in principle provide an optimal approach to inference and learning from data. However, for richly structured data domains such as natural language and images, probabilistic models are often computationally intractable and/or have to make strong conditional independence assumptions to retain computational as well as statistical efficiency. As a consequence, they are often inferior in predictive performance, when compared to current state-of-the-art deep learning approaches. It is a natural question to ask, whether one can combine the benefits of deep learning with those of probabilistic models. The major conceptual challenge is to define deep models that are generative, i.e. that can be thought of as models of the underlying data generating mechanism. We thus propose to leverage and extend recent advances in generative neural networks to build rich probabilistic models for structured domains such as text and images. The extension of efficient probabilistic neural models will allow us to represent complex and multimodal uncertainty efficiently. To demonstrate the usefulness of the developed probabilistic neural models we plan to apply them to challenging multimodal applications such as creating textual descriptions for images or database records.
EPFL projects (2014-2016)
EPFL PI: Serge Vaudenay
Microsoft PI: Markulf Kohlweiss
For an encryption scheme to be practically useful, it must deliver on two complementary goals: the confidentiality and integrity of encrypted data. Historically, these goals were achieved by combining separate primitives, one to ensure confidentiality and another to guarantee integrity. This approach is neither the most efficient (for instance, it requires processing the input stream at least twice), nor does it protect against implementation errors. To address these concerns, the notion of Authenticated Encryption (AE), which simultaneously achieves confidentiality and integrity, was put forward as a desirable first-class primitive to be exposed by libraries and APIs to the end developer. Providing direct access to AE rather than requiring developers to orchestrate calls to several lower-level functions is seen as a step towards improving the quality of security-critical code. An indication of both the importance of useable AE and the difficulty of getting it right, are the number of standards that were developed over the years. These specified different methods for AE: the CCM method is specified in IEEE 802.11i, IPsec ESP, and IKEv2; the GCM method is specified in NIST SP 800-38D; the EAX method is specified in ANSI C12.22; and ISO/IEC 19772:2009 defines six methods, including five dedicated AE designs and one generic composition method, namely Encrypt-then-MAC. Several security issues have recently arisen and been reported in the (mis)use of symmetric key encryption with authentication in practice. As a result, the cryptographic community has initiated the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR), to boost public discussions towards a better understanding of these issues, and to identify a portfolio of efficient and secure AE schemes. Our project aims to contribute to the design, analysis, evaluation, and classification of the emerging AE schemes during the CAESAR competition. It has effected many practical security protocols that use AE schemes as indispensable underlying primitives. Our work has broader implications for the theory of AE as an important research area in symmetric-key cryptography.
EPFL PIs: Edouard Bugnion, Babak Falsafi
Microsoft PI: Dushyanth Naraya
The goal of the Scale-Out NUMA project is to deliver energy-efficient, low-latency access to remote memory in datacentre applications, with a focus on rack-scale deployments. Such infrastructure will become critical for both web-scale only applications as well as scale-out analytics where the dataset can reside in the collective (but distributed) memory of a cluster of servers. Our approach to the problem layers an RDMA-inspired programming model directly on top of a NUMA fabric via stateless messaging protocol. To facilitate interactions between the application, the OS and the fabric, soNUMA relies on the remote memory controller – a new architecturally-exposed hardware block integrated into the node’s local coherence hierarchy.
EPFL PI: Florin Dinu
Microsoft PI: Sergey Legtchenko
Our vision is of resource-efficient datacenters where the compute nodes are fully utilized. We see two challenges to manifesting this vision. The first is the increasing use of hardware heterogeneity in datacenters. Heterogeneity, while both unavoidable and desirable, does not lend itself to today’s systems and algorithms, which inefficiently handle heterogeneity. The second challenge is the aggressive scale-out of datacenters. Scale-out has made it conveniently easy to disregard inefficiencies at the level of individual compute nodes because it has been historically easy to expand to new resources. However, apart from being unnecessarily costly, such scale-out techniques are now becoming impractical due to the size of the datasets. Moreover, scale-out often adds new inefficiencies. We argue that to meet these challenges, we must start from a thorough understanding of the resource requirements of today’s datacenter jobs. With this understanding, we aim to design new scheduling techniques that efficiently use resources, even in heterogeneous environments. Further, we aim to fundamentally change the way data-parallel processing systems are built and to make efficient compute node resource utilization a cornerstone of their design. Our first goal is to automatically characterize the pattern of memory requirements of data-parallel jobs. Specifically, we want to go beyond the current practices that are interested only in peak memory usage. To better identify opportunities for efficient memory management, more granular information is necessary. Our second goal is to use knowledge of the pattern of memory requirements to design informed scheduling algorithms that manage memory efficiently. The third goal of the project is to design data-parallel processing systems that are efficient in terms of managing memory, not only by understanding task memory requirements, but also by shaping those memory requirements.
ETH Zurich projects (2014-2016)
ETH Zurich PI: Torsten Hoefler
Microsoft PI: Miguel Castro
Disk-backed in-memory key/value stores are gaining significance as many industries are moving toward big data analytics. Storage space and query time requirements are challenging, since the analysis has to be performed at the lowest cost to be useful from a business perspective. Despite those cost constraints, today’s systems are heavily overprovisioned when it comes to resiliency. The undifferentiated three-copy approach leads to a potential waste of bandwidth and storage resources, which then makes the overall system less efficient or more expensive. We propose to revisit currently used resiliency schemes, with the help of analytical hardware failure models. We will utilize those models to capture the exact tradeoff between the overhead due to replication and the exact resiliency requirements that are defined in a contract. Our key idea is to model reliability as an explicit resource that the user allocates consciously. In previous work, we have been able to speed-up scientific computing applications, as well as a distributed hashtable, on several hundred-thousand cores by more than 20 percent, with the use of advanced RDMA programming techniques. We have also demonstrated low-cost resiliency schemes based on erasure coding for RDMA environments. In addition, we propose to apply our experience with large-scale RDMA programming to the design of in-memory databases, a problem very similar to distributed hashtables. To make reliability explicit, we plan to extend the key value store with explicit reliability attributes that allow the user to specify reliability and availability requirements for each key (or group of keys). Our work may change the perspective in datacenter resiliency. Defining fine-grained, per-object resiliency levels and tuning them to the exact environment may provide large cost benefits and impact industry. For example, changing the standard three-replica scheme to erasure coding can easily save 30 percent of storage expenses.
ETH Zurich PI: Gustavo Alonso
Microsoft PI: Ken Eguro
One of the biggest challenges for software these days is to adapt to the rapid changes in hardware and processor architecture. On the one hand, extracting performance from modern hardware requires dealing with increasing levels of parallelism. On the other hand, the wide variety of architectural possibilities and multiplicity of processor types raise many questions in terms of the optimal platform for deploying applications. In this project we will explore the efficient implementation of data processing operators in FPGAs, as well as the architectural issues involved in the integration of FPGAs as co-processors in commodity servers. The target application is big data and data processing engines (relational, No-SQL, data warehouses, etc.). Through this line of work, the project aims at exploring architectures that will result in computing nodes with a smaller energy consumption and physical size, but capable of providing a performance boost to applications for big data. FPGAs should be seen here not as a goal in themselves, but as an enabling platform for the exploration of different architectures and levels of parallelism that will allow us to bypass the inherent restriction of conventional processors. On the practical side, the project will focus on both the use of FPGAs as co-processors inside existing engines, as well as on developing proof-of-concept implementations of data processing engines entirely implemented in FPGAs. In this area, the project complements very well with ongoing efforts at Microsoft Research around Cipherbase, a trusted computing system based on SQL server deployments in the cloud. On the conceptual side, the project will explore the development of data structures and algorithms capable of exploiting the massive parallelism available in FPGAs, with a view to gaining much needed insights on how to adapt existing data processing systems to multi- and many-core architectures. Here, we expect to gain insights on how to redesign both standard relational data operators, as well as data mining and machine learning operators, to better take advantage of the increasing amounts of parallelism available in future processors.
ETH Zurich PIs: Otmar Hilliges, Marc Pollefeys
Microsoft PI: Shahram Izadi
In recent years, robotics research has made tremendous progress and it is becoming conceivable that robots will be as ubiquitous and irreplaceable in our daily lives as they are within industrial settings. Continued improvements, in terms of mechatronics and control aspects, coupled with continued advances in consumer electronics, have made robots ever smaller, autonomous, and agile. One area of recent advances in robotics is the notion of micro-aerial vehicles (MAVs) [14, 16]. These are small, flying robots that are very agile, can operate in a 3D space, indoors and outdoors, and can carry small payloads — including input and output devices — and can navigate difficult environments, such as stairs, more easily than terrestrial robots; and hence can reach locations that no other robot or indeed humans can reach. Surprisingly, to date there is little research on such flying robots in an interactive context or on MAVs operating in near proximity to humans. In our project, we explore the opportunities that arise from aerial robots that operate in close proximity to and in collaboration with a human user. In particular, we are interested in developing a robotic platform in which a) the robot is aware of the human user and can navigate relative to the user; b) the robot can recognize various gestures from afar, as well as receive direct, physical manipulations; c) the robot can carry small payloads — in particular input and output devices such as additional cameras or projectors. Finally, we are developing novel algorithms to track and recognize user input, by using the onboard cameras, in real-time and with very low-latency, to build on the now substantial body of research on gestural and natural interfaces. Gesture recognition can be used for MAV control (for example, controlling the camera) or to interact with virtual content.
ETH Zurich PI: Roger Wattenhofer
Microsoft PI: Ratul Mahajan
The Internet is designed as a robust service to ensure that we can use it with selfish participants present. As such, a loss in total performance must be accepted. However, if a whole wide-area network (WAN) was controlled by a single entity, why should one use the very techniques designed for the Internet? Large providers such as Microsoft, Amazon, or Google operate their own WANs, which cost them hundreds of millions of dollars per year; yet even their busier links average only 40–60 percent utilization. This gives rise to Software Defined Networks (SDNs), which allow the separation of the data and the control plane in a network. A centralized controller can install and update rules all over the WAN, to optimize its goals. Despite SDNs receiving a lot of attention in both theory and practice, many questions are still unanswered. Even though the control of the network is centralized, distributing the updates does not happen instantaneously. Numerous problems can occur, such as the dropping of packets, generation of loops, breaking the memory/bandwidth limit of switches/links, and missing packet coherence. These problems must be solved before SDNs can be broadly deployed. This research project sheds more light on these fundamental issues of SDNs and how they can be tackled. In parallel, we look at SDNs from a game-theoretic perspective.
Microsoft Research Podcast | April 10, 2019
Microsoft Research Blog | February 5, 2019
Microsoft continues to invest in research cooperation with ETH and EPFL, thereby strengthening Switzerland’s innovation power
Microsoft Switzerland | November 1, 2018
Microsoft Switzerland | October 29, 2018
Microsoft Research Blog | December 4, 2017
Microsoft Research Blog | February 21, 2017
Microsoft Research Blog | February 5, 2014
Don’t read German? Download Microsoft Translator
Mixed-Reality-Brillen werden die Smartphones verdrängen (German)
Inside It | June 1, 2021
Smarte Brille fuer jedermann (German, PDF)
Bilanz | Das Schweizer Wirtschaftsmagazin | February 2020
Ein Quantum Hype (German, PDF)
Bilanz | Das Schweizer Wirtschaftsmagazin | February 2020
Demystifying Quantum – Panel discussion at WEF Annual Meeting 2020
ETH Zurich | January 21, 2020
Mixed Reality & AI Zurich Lab – a Lab is born
Microsoft News Center – Switzerland | October 3, 2019
CyberPeace Institute fills a critical need for cyberattack victims
Microsoft Blog | September 26, 2019
DNA as storage medium of the future (German)
Neue Zürcher Zeitung | March 30, 2019
Smartglasses will replace the mobile phone (German, PDF)
Tages-Anzeiger | March 13, 2019
Man and Machine – Panel discussion at WEF Annual Meeting 2019
ETH Zurich | February 12, 2019
Microsoft CEO Satya Nadella at ETH Zurich
Startup Ticker | November 1, 2018
ETH and Microsoft: Hunting for talent (German)
ETH News | November 1, 2018
Microsoft expands partnership with ETH Zurich (German)
Inside IT & Inside Channels | November 1, 2018
Microsoft and ETH are chasing talent (German)
Netzwoche | November 2, 2018
Microsoft CEO Satya Nadella visits ETH Zurich (German)
Computerworld | November 2, 2018
Microsoft extends cooperation with ETH, EPFL, partners with Mixed Reality & AI Zurich Lab
Telecompaper | November 2, 2018
We invite submissions for collaborative research projects between ETH Zurich/EPFL and Microsoft, including Microsoft Research and Mixed Reality & AI Labs in Zurich and Cambridge, until September 17, 2021. For more details, see Swiss JRC RFP.
Vice President for Information Systems
Partner Group Hardware Eng Mgr Partner
Jen Jen Chung
Senior Principal Research Manager
Technical Fellow and Chief Scientific Officer
Senior Principal Researcher
Deputy Head of Institute for Machine Learning
Deputy Director, Autonomous Systems Lab
Partner Director of Science
Principal Research Manager
Head of Wyss Translational Center Zurich / Head of Inst. Robotics and Intelligent Systems
VP and Distinguished Engineer
Tenure Track Assistant Professor
Ryen W. White
Partner Research Manager
Professor Department of Computer Science
National Technology Officer (NTO)
Distinguished Scientist and Director of Redmond Lab
Dean of School of Computer and Communications Science