Congestion Control for Large-Scale RDMA Deployments

  • Yibo Zhu
  • Yibo Zhu
  • Haggai Eran
  • Daniel Firestone
  • Chuanxiong Guo
  • Marina Lipshteyn
  • Yehonatan Liron
  • Jitendra Padhye
  • Shachar Raindel
  • Mohamad Haj Yahia
  • Ming Zhang

SIGCOMM |

Published by ACM - Association for Computing Machinery

View Publication | View Publication

Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 microsecond per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-routed datacenter networks, RDMA is deployed using RoCEv2 protocol, which relies on Priority-based Flow Control (PFC) to enable a drop-free network. However, PFC can lead to poor application performance due to problems like head-of-line blocking and unfairness. To alleviates these problems, we introduce DCQCN, an end-to-end congestion control scheme for RoCEv2. To optimize DCQCN performance, we build a fluid model, and provide guidelines for tuning switch buffer thresholds, and other protocol parameters. Using a 3-tier Clos network testbed, we show that DCQCN dramatically improves throughput and fairness of RoCEv2 RDMA traffic. DCQCN is implemented in Mellanox NICs, and is being deployed in Microsoft’s datacenters.