Microsoft at NSDI 2023: A commitment to advancing networking and distributed systems

Published

By , Managing Director, Research for Industry

nsdi'23 on a red background with

Microsoft has made significant contributions to the prestigious USENIX NSDI’23 (opens in new tab) conference, which brings together experts in computer networks and distributed systems. A silver sponsor (opens in new tab) for the conference, Microsoft is a leader in developing innovative technologies for networking, and we are proud to have contributed to 30 papers accepted this year. Our team members also served on the program committee, highlighting our commitment to advancing the field.

The accepted research papers span a wide range of topics, including networking for AI workloads, cloud networking, WAN, and wireless networks. These papers showcase some of the latest advancements in networking research.

The paper, “DOTE: Rethinking (Predictive) WAN Traffic Engineering”, which revisits traffic engineering in the Wide Area Network (WAN), was selected for one of the Best Paper Awards at the conference. This work was done jointly by researchers at Microsoft, along with academics at Hebrew University of Jerusalem and Technion.

Some other innovations on cloud networking infrastructure include:

Empowering Azure Storage with RDMA, which presents the findings from deploying intra-region Remote Direct Memory Access (RDMA) to support storage workloads in Azure. Today, around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in all Azure public regions. RDMA helps us achieve significant disk I/O performance improvements and CPU core savings. This research is a testament to Microsoft’s ongoing commitment to providing customers with the best possible user experience.

Disaggregating Stateful Network Functions (opens in new tab), which introduces a new approach for better reliability and performance at a lower per-server cost for cloud users. The core idea is to move the network function processing off individual servers and into shared resource pools. This technology is now shipping as part of Microsoft Azure Accelerated Connections (opens in new tab).

Our colleagues from Microsoft Research Asia, will present ARK: GPU-driven Code Execution for Distributed Deep Learning (opens in new tab), which overcomes the overhead of GPU communication for large deep learning workloads by having GPUs run their code, and handle communication events autonomously, without CPU intervention.

Microsoft Research Podcast

AI Frontiers: Models and Systems with Ece Kamar

Ece Kamar explores short-term mitigation techniques to make these models viable components of the AI systems that give them purpose and shares the long-term research questions that will help maximize their value. 

Microsoft’s collective contributions to the USENIX NSDI’23 conference highlight our commitment to advancing the field of networking research and developing innovative solutions to real-world networking problems, leveraging strong academic collaborations. We look forward to continuing to push the boundaries of what is possible in networking research and delivering cutting-edge solutions to our customers.

A complete list of Microsoft papers accepted at USENIX NSDI’23:

  1. Understanding RDMA Microarchitecture Resources for Performance Isolation (opens in new tab), Xinhao Kong and Jingrong Chen, Duke University; Wei Bai, Microsoft; Yechen Xu, Shanghai Jiao Tong University; Mahmoud Elhaddad, Shachar Raindel, and Jitendra Padhye, Microsoft; Alvin R. Lebeck and Danyang Zhuo, Duke University
  2. Empowering Azure Storage with RDMA (opens in new tab), Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, Paramvir Bahl, Ameya Bhagat, Gowri Bhaskara, Tanya Brokhman, Lei Cao, Ahmad Cheema, Rebecca Chow, Jeff Cohen, Mahmoud Elhaddad, Vivek Ette, Igal Figlin, Daniel Firestone, Mathew George, Ilya German, Lakhmeet Ghai, Eric Green, Albert Greenberg, Manish Gupta, Randy Haagens, Matthew Hendel, Ridwan Howlader, Neetha John, Julia Johnstone, Tom Jolly, Greg Kramer, David Kruse, Ankit Kumar, Erica Lan, Ivan Lee, Avi Levy, Marina Lipshteyn, Xin Liu, Chen Liu, Guohan Lu, Yuemin Lu, Xiakun Lu, Vadim Makhervaks, Ulad Malashanka, David A. Maltz, Ilias Marinos, Rohan Mehta, Sharda Murthi, Anup Namdhari, Aaron Ogus, Jitendra Padhye, Madhav Pandya, Douglas Phillips, Adrian Power, Suraj Puri, Shachar Raindel, Jordan Rhee, Anthony Russo, Maneesh Sah, Ali Sheriff, Chris Sparacino, Ashutosh Srivastava, Weixiang Sun, Nick Swanson, Fuhou Tian, Lukasz Tomczyk, Vamsi Vadlamuri, Alec Wolman, Ying Xie, Joyce Yom, Lihua Yuan, Yanzhao Zhang, and Brian Zill, Microsoft
  3. ARK: GPU-driven Code Execution for Distributed Deep Learning (opens in new tab), Changho Hwang, KAIST, Microsoft Research; KyoungSoo Park, KAIST; Ran Shu, Xinyuan Qu, Peng Cheng, and Yongqiang Xiong, Microsoft Research
  4. Hydra: Serialization-Free Network Ordering for Strongly Consistent Distributed Applications (opens in new tab), Inho Choi, National University of Singapore; Ellis Michael, University of Washington; Yunfan Li, National University of Singapore; Dan R. K. Ports, Microsoft Research; Jialin Li, National University of Singapore
  5. Waverunner: An Elegant Approach to Hardware Acceleration of State Machine Replication (opens in new tab), Mohammadreza Alimadadi and Hieu Mai, Stony Brook University; Shenghsun Cho, Microsoft; Michael Ferdman, Peter Milder, and Shuai Mu, Stony Brook University
  6. Scalable Distributed Massive MIMO Baseband Processing (opens in new tab), Junzhi Gong, Harvard University; Anuj Kalia, Microsoft; Minlan Yu, Harvard University
  7. Unlocking unallocated cloud capacity for long, uninterruptible workloads (opens in new tab), Anup Agarwal, Carnegie Mellon University; Shadi Noghabi, Microsoft Research; Íñigo Goiri, Azure Systems Research; Srinivasan Seshan, Carnegie Mellon University; Anirudh Badam, Microsoft Research
  8. Invisinets: Removing Networking from Cloud Networks (opens in new tab), Sarah McClure and Zeke Medley, UC Berkeley; Deepak Bansal and Karthick Jayaraman, Microsoft; Ashok Narayanan, Google; Jitendra Padhye, Microsoft; Sylvia Ratnasamy, UC Berkeley and Google; Anees Shaikh, Google; Rishabh Tewari, Microsoft
  9. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs (opens in new tab), John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, and Yifan Qiao, UCLA; Zhihao Jia, CMU; Minjia Zhang, Microsoft Research; Ravi Netravali, Princeton University; Guoqing Harry Xu, UCLA
  10. OneWAN is better than two: Unifying a split WAN architecture (opens in new tab), Umesh Krishnaswamy, Microsoft; Rachee Singh, Microsoft and Cornell University; Paul Mattes, Paul-Andre C Bissonnette, Nikolaj Bjørner, Zahira Nasrin, Sonal Kothari, Prabhakar Reddy, John Abeln, Srikanth Kandula, Himanshu Raj, Luis Irun-Briz, Jamie Gaudette, and Erica Lan, Microsoft
  11. TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches (opens in new tab), Aashaka Shah, University of Texas at Austin; Vijay Chidambaram, University of Texas at Austin and VMware Research; Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, and Olli Saarikivi, Microsoft Research; Rachee Singh, Microsoft and Cornell University
  12. Synthesizing Runtime Programmable Switch Updates (opens in new tab), Yiming Qiu, Rice University; Ryan Beckett, Microsoft; Ang Chen, Rice University
  13. Formal Methods for Network Performance Analysis (opens in new tab), Mina Tahmasbi Arashloo, University of Waterloo; Ryan Beckett, Microsoft Research; Rachit Agarwal, Cornell University
  14. Scalable Tail Latency Estimation for Data Center Networks (opens in new tab), Kevin Zhao, University of Washington; Prateesh Goyal, Microsoft Research; Mohammad Alizadeh, MIT CSAIL; Thomas E. Anderson, University of Washington
  15. Addax: A fast, private, and accountable ad exchange infrastructure (opens in new tab), Ke Zhong, Yiping Ma, and Yifeng Mao, University of Pennsylvania; Sebastian Angel, University of Pennsylvania & Microsoft Research
  16. RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics (opens in new tab), Mehrdad Khani, MIT CSAIL and Microsoft; Ganesh Ananthanarayanan and Kevin Hsieh, Microsoft; Junchen Jiang, University of Chicago; Ravi Netravali, Princeton University; Yuanchao Shu, Zhejiang University; Mohammad Alizadeh, MIT CSAIL; Victor Bahl, Microsoft
  17. Tambur: Efficient loss recovery for videoconferencing via streaming codes (opens in new tab), Michael Rudow, Carnegie Mellon University; Francis Y. Yan, Microsoft Research; Abhishek Kumar, Carnegie Mellon University; Ganesh Ananthanarayanan and Martin Ellis, Microsoft; K.V. Rashmi, Carnegie Mellon University
  18. Gemel: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge (opens in new tab), Arthi Padmanabhan, UCLA; Neil Agarwal, Princeton University; Anand Iyer and Ganesh Ananthanarayanan, Microsoft Research; Yuanchao Shu, Zhejiang University; Nikolaos Karianakis, Microsoft Research; Guoqing Harry Xu, UCLA; Ravi Netravali, Princeton University
  19. On Modular Learning of Distributed Systems for Predicting End-to-End Latency (opens in new tab), Chieh-Jan Mike Liang, Microsoft Research; Zilin Fang, Carnegie Mellon University; Yuqing Xie, Tsinghua University; Fan Yang, Microsoft Research; Zhao Lucis Li, University of Science and Technology of China; Li Lyna Zhang, Mao Yang, and Lidong Zhou, Microsoft Research
  20. SelfTune: Tuning Cluster Managers (opens in new tab), Ajaykrishna Karthikeyan and Nagarajan Natarajan, Microsoft Research; Gagan Somashekar, Stony Brook University; Lei Zhao, Microsoft; Ranjita Bhagwan, Microsoft Research; Rodrigo Fonseca, Tatiana Racheva, and Yogesh Bansal, Microsoft
  21. OpenLoRa: Validating LoRa Implementations through an Extensible and Open-sourced Framework (opens in new tab), Manan Mishra, Daniel Koch, Muhammad Osama Shahid, and Bhuvana Krishnaswamy, University of Wisconsin-Madison; Krishna Chintalapudi, Microsoft Research; Suman Banerjee, University of Wisconsin-Madison
  22. ExoPlane: An Operating System for On-Rack Switch Resource Augmentation (opens in new tab), Daehyeok Kim, Microsoft and University of Texas at Austin; Vyas Sekar and Srinivasan Seshan, Carnegie Mellon University
  23. Sketchovsky: Enabling Ensembles of Sketches on Programmable Switches (opens in new tab), Hun Namkung, Carnegie Mellon University; Zaoxing Liu, Boston University; Daehyeok Kim, Microsoft Research; Vyas Sekar and Peter Steenkiste, Carnegie Mellon University
  24. Acoustic Sensing and Communication Using Metasurface (opens in new tab), Yongzhao Zhang, Yezhou Wang, and Lanqing Yang, Shanghai Jiao Tong University; Mei Wang, UT Austin; Yi-Chao Chen, Shanghai Jiao Tong University and Microsoft Research Asia; Lili Qiu, UT Austin and Microsoft Research Asia; Yihong Liu, University of Glasgow; Guangtao Xue and Jiadi Yu, Shanghai Jiao Tong University
  25. Disaggregating Stateful Network Functions (opens in new tab), Deepak Bansal, Gerald DeGrace, Rishabh Tewari, Michal Zygmunt, and James Grantham, Microsoft; Silvano Gai, Mario Baldi, Krishna Doddapaneni, Arun Selvarajan, Arunkumar Arumugam, and Balakrishnan Raman, AMD Pensando; Avijit Gupta, Sachin Jain, Deven Jagasia, Evan Langlais, Pranjal Srivastava, Rishiraj Hazarika, Neeraj Motwani, Soumya Tiwari, Stewart Grant, Ranveer Chandra, and Srikanth Kandula, Microsoft
  26. Doing More with Less: Orchestrating Serverless Applications without an Orchestrator (opens in new tab), David H. Liu and Amit Levy, Princeton University; Shadi Noghabi and Sebastian Burckhardt, Microsoft Research
  27. NetPanel: Traffic Measurement of Exchange Online Service (opens in new tab), Yu Chen, Microsoft 365, China; Liqun Li and Yu Kang, Microsoft Research, China; Boyang Zheng, Yehan Wang, More Zhou, Yuchao Dai, and Zhenguo Yang, Microsoft 365, China; Brad Rutkowski and Jeff Mealiffe, Microsoft 365, USA; Qingwei Lin, Microsoft Research, China
  28. DOTE: Rethinking (Predictive) WAN Traffic Engineering (opens in new tab), Yarin Perry, Hebrew University of Jerusalem; Felipe Vieira Frujeri, Microsoft Research; Chaim Hoch, Hebrew University of Jerusalem; Srikanth Kandula and Ishai Menache, Microsoft Research; Michael Schapira, Hebrew University of Jerusalem; Aviv Tamar, Technion
  29. Push-Button Reliability Testing for Cloud-Backed Applications with Rainmaker (opens in new tab), Yinfang Chen and Xudong Sun, University of Illinois at Urbana-Champaign; Suman Nath, Microsoft Research; Ze Yang and Tianyin Xu, University of Illinois at Urbana-Champaign
  30. Test Coverage for Network Configurations (opens in new tab), Xieyang Xu and Weixin Deng, University of Washington; Ryan Beckett, Microsoft; Ratul Mahajan, University of Washington; David Walker, Princeton University

NSDI 2023 Program Committee members:

Members of other committees:

Related publications

Continue reading

See all blog posts