Microsoft at NSDI 2024: Discoveries and implementations in networked systems

Published

By , Managing Director, Research for Industry

nsdi'24 logo in white on a blue and green gradient background

Networked systems and their applications are essential in building the reliable, scalable, secure, and innovative infrastructure required to meet society’s evolving needs. One of the premier events in this field, the 21st USENIX Symposium on Networked Systems Design and Implementation (opens in new tab) (NSDI ‘24), provides a platform for researchers and experts to share insights, present research findings, and collaborate on the latest advances in the design, implementation, and evaluation of networked and distributed systems.

Microsoft is honored to support NSDI ‘24 as a returning sponsor. This partnership underscores our commitment to fostering innovation and research within the field. Additionally, members of our team have taken on key roles in organizing the event, including contributions to the program committee and leadership as conference co-chair.

We are pleased to announce that 19 papers from Microsoft researchers and their partners have been accepted to the conference, including a paper on Autothrottle, a resource management framework, which won the Outstanding Paper Award. These papers represent a broad spectrum of research topics, ranging from 5G, space, datacenters, and wide-area networking to applications in artificial intelligence, security, video conferencing, and gaming. They encompass both early-stage research and systems already deployed in production. This post highlights some of this work.

Spotlight: Event Series

Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI. Watch Episodes 1 & 2 on-demand.

Paper highlights

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

Outstanding Paper Award

As cloud applications increasingly adopt microservices, resource managers face two distinct levels of system behavior: end-to-end application latency and per-service resource usage. To coordinate them, this research introduces Autothrottle, a bi-level resource management framework for microservices with latency service-level objectives (SLOs). Autothrottle employs an application-wide learning-based controller to periodically set performance targets—expressed as CPU throttle ratios—for per-service heuristic controllers to achieve. When tested using production workloads, Autothrottle demonstrated higher CPU savings and fewer SLO violations than the best performing baseline from Kubernetes. 

Application-Level Service Assurance with 5G RAN Slicing

This paper presents Zipper, an innovative Radio Access Network (RAN) slicing system that provides application-specific throughput and latency. Traditional methods focus on overall slice performance, often neglecting individual app needs, leading to optimization challenges. Zipper addresses this by adopting a model-predictive control approach, precisely managing network use per user with an advanced algorithm for optimal bandwidth allocation. Additionally, it offers a tool for network operators to assess the feasibility of new applications without exceeding capacity.

Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things

Low Earth Orbit satellite networks offer a promising way to connect low-power IoT devices globally without terrestrial gateways, using cost-effective pico-satellites. This paper addresses the communication challenges these networks face, such as limited link budgets, satellite movement, and signal interference. It introduces a novel technique using the Doppler shift-from-satellite motion to improve packet detection and decoding, even with low signal-to-noise ratios or during packet collisions. This method, called Spectrumize, achieves a threefold increase in packet detection and over 80 percent accuracy in decoding, significantly outperforming conventional methods.

Solving Max-Min Fair Resource Allocations Quickly on Large Graphs

This paper tackles the problem of max-min fair resource allocation, crucial for WAN traffic engineering and cluster scheduling, especially as scale increases. This research streamlines the process into a single rapid optimization task to accommodate multi-path scenarios. Tests show that these algorithms surpass previous methods, delivering quicker, fairer, and more efficient allocations. Implemented in Azure’s WAN traffic engineering, these methods not only retain solution quality but also achieve about a threefold increase in processing speed.

Finding Adversarial Inputs for Heuristics using Multi-level Optimization

Production systems often employ heuristics because they are faster and scale better than their optimal counterparts. However, practitioners may not be aware of the performance differences between a heuristic and the optimal solution, or between two heuristics, in real-world scenarios. MetaOpt addresses this by enabling direct comparison of heuristics against optimal solutions or other heuristics. It efficiently processes inputs for a solver to identify performance gaps and generate adversarial inputs that expose these differences, scaling effectively to real-world problems.

NetVigil: Robust and Low-Cost Anomaly Detection for East-West Data Center Security

This research introduces NetVigil, an advanced anomaly-detection system that monitors east-west traffic in datacenters. It uses graph-based features from network flows and utilizes graph neural networks (GNNs) along with contrastive learning to improve its effectiveness against both common and sophisticated threats. When tested across multiple attack scenarios and real-world data, NetVigil significantly outperformed existing anomaly detection solutions in accuracy, cost-efficiency, and speed, offering a viable addition to safeguard data center traffic.

Complete list of accepted publications by Microsoft researchers

ADR-X: ANN-Assisted Wireless Link Rate Adaptation for Compute-Constrained Embedded Gaming Devices
Hao Yin, University of Washington; Murali Ramanujam, Princeton University; Joe Schaefer, Stan Adermann, Srihari Narlanka, and Perry Lea, Microsoft; Ravi Netravali, Princeton University; Krishna Chintalapudi, Microsoft Research

Application-Level Service Assurance with 5G RAN Slicing
Arjun Balasingam, MIT CSAIL; Manikanta Kotaru and Victor Bahl, Microsoft Research

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices
Zibo Wang, University of Science and Technology of China and Microsoft Research; Pinghe Li, ETH Zurich; Chieh-Jan Mike Liang, Microsoft Research; Feng Wu, University of Science and Technology of China; Francis Y. Yan, Microsoft Research

CHISEL: An optical slice of the wide-area network
Abhishek Vijaya Kumar, Cornell University; Bill Owens, NYSERnet; Nikolaj Bjørner, Binbin Guan, Yawei Yin, and Victor Bahl, Microsoft; Rachee Singh, Cornell University

Cloud-LoRa: Enabling Cloud Radio Access LoRa Networks Using Reinforcement Learning Based Bandwidth-Adaptive Compression
Muhammad Osama Shahid, Daniel Koch, Jayaram Raghuram, and Bhuvana Krishnaswamy, University of Wisconsin-Madison; Krishna Chintalapudi, Microsoft Research; Suman Banerjee, University of Wisconsin-Madison

Cyclops: A Nanomaterial-based, Battery-Free Intraocular Pressure (IOP) Monitoring System inside Contact Lens
Liyao Li, University at Buffalo SUNY and Northwest University; Bozhao Shang and Yun Wu, Northwest University and Shaanxi International Joint Research Centre for the Battery-Free Internet of Things; Jie Xiong, University of Massachusetts Amherst and Microsoft Research Asia; Xiaojiang Chen, Northwest University and Shaanxi International Joint Research Centre for the Battery-Free Internet of Things; Yaxiong Xie, University at Buffalo SUNY

ExChain: Exception Dependency Analysis for Root Cause Diagnosis
Ao Li, Carnegie Mellon University; Shan Lu, Microsoft Research and University of Chicago; Suman Nath, Microsoft Research; Rohan Padhye and Vyas Sekar, Carnegie Mellon University

Finding Adversarial Inputs for Heuristics using Multi-level Optimization
Pooria Namyar, Microsoft and University of Southern California; Behnaz Arzani and Ryan Beckett, Microsoft; Santiago Segarra, Microsoft and Rice University; Himanshu Raj and Umesh Krishnaswamy, Microsoft; Ramesh Govindan, University of Southern California; Srikanth Kandula, Microsoft

Gemino: Practical and Robust Neural Compression for Video Conferencing
Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, and Mehrdad Khani, Massachusetts Institute of Technology; Sadjad Fouladi, Microsoft Research; Mohammad Alizadeh, Frédo Durand, and Vivienne Sze, Massachusetts Institute of Technology

GRACE: Loss-Resilient Real-Time Video through Neural Codecs
Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, and Yue Zhang, The University of Chicago; Qizheng Zhang, Stanford University; Yuhan Liu, Kuntai Du, and Xu Zhang, The University of Chicago; Francis Y. Yan, Microsoft; Amrita Mazumdar, NVIDIA; Nick Feamster and Junchen Jiang, The University of Chicago

LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search
Chengquan Feng, University of Science and Technology of China; Li Lyna Zhang, Microsoft Research; Yuanchi Liu, University of Science and Technology of China; Jiahang Xu and Chengruidong Zhang, Microsoft Research; Zhiyuan Wang, University of Science and Technology of China; Ting Cao and Mao Yang, Microsoft Research; Haisheng Tan, University of Science and Technology of China

Making Kernel Bypass Practical for the Cloud with Junction
Joshua Fried and Gohar Irfan Chaudhry, MIT CSAIL; Enrique Saurez, Esha Choukse, and Íñigo Goiri, Azure Research – Systems; Sameh Elnikety, Microsoft Research; Rodrigo Fonseca, Azure Research – Systems; Adam Belay, MIT CSAIL

MESSI: Behavioral Testing of BGP Implementations
Rathin Singha and Rajdeep Mondal, University of California Los Angeles; Ryan Beckett, Microsoft; Siva Kesava Reddy Kakarla, Microsoft Research; Todd Millstein and George Varghese, University of California Los Angeles

NetVigil: Robust and Low-Cost Anomaly Detection for East-West Data Center Security
Kevin Hsieh, Microsoft; Mike Wong, Princeton University and Microsoft; Santiago Segarra, Microsoft and Rice University; Sathiya Kumaran Mani, Trevor Eberl, and Anatoliy Panasyuk, Microsoft; Ravi Netravali, Princeton University; Ranveer Chandra and Srikanth Kandula, Microsoft

OPPerTune: Post-Deployment Configuration Tuning of Services Made Easy
Gagan Somashekar, Stony Brook University; Karan Tandon and Anush Kini, Microsoft Research; Chieh-Chun Chang and Petr Husak, Microsoft; Ranjita Bhagwan, Google; Mayukh Das, Microsoft365 Research; Anshul Gandhi, Stony Brook University; Nagarajan Natarajan, Microsoft Research

Sequence Abstractions for Flexible, Line-Rate Network Monitoring
Andrew Johnson, Princeton University; Ryan Beckett, Microsoft Research; Xiaoqi Chen, Princeton University; Ratul Mahajan, University of Washington; David Walker, Princeton University

Solving Max-Min Fair Resource Allocations Quickly on Large Graphs
Pooria Namyar, Microsoft and University of Southern California; Behnaz Arzani and Srikanth Kandula, Microsoft; Santiago Segarra, Microsoft and Rice University; Daniel Crankshaw and Umesh Krishnaswamy, Microsoft; Ramesh Govindan, University of Southern California; Himanshu Raj, Microsoft

Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things
Vaibhav Singh, Tusher Chakraborty, and Suraj Jog, Microsoft Research; Om Chabra and Deepak Vasisht, UIUC; Ranveer Chandra, Microsoft Research

Vulcan: Automatic Query Planning for Live ML Analytics
Yiwen Zhang and Xumiao Zhang, University of Michigan; Ganesh Ananthanarayanan, Microsoft; Anand Iyer, Georgia Institute of Technology; Yuanchao Shu, Zhejiang University; Victor Bahl, Microsoft Research; Z. Morley Mao, University of Michigan and Google; Mosharaf Chowdhury, University of Michigan

Symposium organizers from Microsoft

Program Committee Co-Chair

Irene Zhang

Program Committee

Paolo Costa
Anuj Kalia
Amar Phanishayee
Dan Ports
Francis Yan 

Mentoring Co-Chair

Jay Lorch

Test of Time Awards Committee

Jay Lorch
Amar Phanishayee

Steering Committee

Jay Lorch
Amar Phanishayee

Related publications

Continue reading

See all blog posts