ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY

CoNEXT'16 |

Publication

Data center networks and especially drop-free RoCEv2 networks require efficient congestion control protocols. DCQCN (ECN-based) and TIMELY (delay-based) are two recent proposals for this purpose. In this paper, we analyze DCQCN and TIMELY using fluid models and simulations, for stability, convergence, fairness and flow completion time. We uncover several surprising behaviors of these protocols. For example, we show that DCQCN exhibits non-monotonic stability behavior, and that TIMELY can converge to stable regime with arbitrary unfairness. We propose simple fixes and tuning for ensuring that both protocols converge to and are stable at the fair share point. Finally, using lessons learnt from the analysis, we address the broader question: are there fundamental reasons to prefer either ECN or delay for end-to-end congestion control in data center networks? We argue that ECN is a better congestion signal, due to the way modern switches mark packets, and due to a fundamental limitation of end-to-end delay-based protocols, that we derive