To make cloud computing work, we must make applications run substantially faster, both over the Internet and within data centers. Our measurements of real applications show that today’s protocols fall short, leading to slow page-load times across the Internet and congestion collapses inside the data center. We have developed a new suite of architectures and protocols that boost performance and the robustness of communications to overcome these problems.
About Cloud Faster
We have developed a new suite of architectures and protocols that boost performance and the robustness of communications to overcome these problems. The results are backed by real measurements and a new theory describing protocol dynamics that enables us to remedy fundamental problems in the Transmission Control Protocol.
To speed up the cloud, we have developed two suites of technology:
- DCTCP – changes to the congestion control algorithm of TCP that decreases application latency inside the data center by decreasing queue lengths and packet loss while maintaining high throughput.
- WideArea TCP – changes to the network stack of the “last-hop server” – the last server to touch packets before they travel to the client – that reduce the latency for transferring small objects (5 to 40 KB) by working around last-mile impairments such as loss and high RTT.
We will demo the experience users will have with Bing Web sites, both with and without our improvements. The difference is stunning. We also will show visualizations of intra-data-center communication problems and our changes that fix them. This work stems from collaborations with Bing and Windows Core Operating System Networking.
DCTCP – Reducing Latency Inside the Data Center
The following videos shows what happens when a server (marked 21) in a rack sends a request for information to 20 other servers in the same rack, and then waits for their responses so that it can formulate a summary. This Partition/Aggregate pattern is very common in data center applications, forming the heart of applications as diverse as Web search (querying very large indexes), ad placement (find the best ads to show with a web page), and social networking (find all a user’s friends, or the most interesting info to show to that user).
In both videos, we see a burst of activity as the request is sent out, with all servers responding at roughly the same time with a burst of packets that carries the first part of their response. This burst is known as incast, and it causes the queue at the switch to rapidly grow in size (shown as blocks extending out on a 45 degree angle).
In the case of DCTCP, senders start receiving congestion notifications much earlier than with TCP. They adjust their sending rates, and the queue never overflows. Even after the initial burst of activity, the operation of DCTCP is much smoother than TCP, with senders offering roughly equal amounts of traffic — so much that it even appears they are “taking turns.”
In the case of TCP, packets are lost as the queue grows too large, and the system enters a phase of very uneven and unfair operation — some servers send lots of data, some send none for long periods of time. The queue length varies from being full to being empty, even causing the link to utilization to drop to zero for periods of time before the transfer is completed.