Candidate Talk: Reliable Communication for Datacenters
- Mahesh Balakrishnan | Cornell University
Commodity datacenters of inexpensive machines are the computing platforms of choice for a wide range of applications, from online services and search engines to finance and e-science. Networks within datacenters are complex and often chaotic, with nodes sending and receiving data in many different channels. In addition, datacenters are linked with high-speed optical networks that shuttle data to remote mirrors for disaster tolerance, client locality and energy savings. As the networks within and between datacenters increase in capacity and complexity, the commodity ‘blade-servers’ inside are unable to keep up, either failing to fully utilize the links they are attached to or stalling under traffic spikes. In particular, the protocols running on these machines react to data loss in the network in fundamentally unstable and inefficient ways.
This talk presents two systems for reliable datacenter communication. Ricochet is a reliable multicast protocol for communication within a datacenter and Maelstrom is a transparent proxy for communication between datacenters. Both systems use Forward Error Correction (FEC) techniques in new ways that enable timely and scalable packet recovery, making key choices on where to generate redundant XORs and what to include in them. We show that proactive error correction can be a powerful reliability primitive for constructing fault-tolerant systems that recover rapidly and gracefully from failure.
Speaker Details
Mahesh Balakrishnan is a PhD candidate working with Prof. Ken Birman at Cornell University’s Department of Computer Science. His thesis centers on reliable communication protocols for datacenter environments, and his research interests extend to building reliable and scalable distributed systems of any kind. Prior to joining Cornell, he obtained a BS in Computer Science from Georgia Tech in 2003.
-
-
Jeff Running
-
Mahesh Balakrishnan
-
-
Watch Next
-
Dion2: A new simple method to shrink matrix in Muon
- Anson Ho,
- Kwangjun Ahn
-
-
-
-
-
-
Beyond Swahili: Designing Inclusive AI for Bantu Languages
- Alfred Malengo Kondoro
-
-
-
GeoMind: A Multi-Agent Framework for Geospatial Decision Support
- Muhammad Sohail Danish