Nüwa: Efficient Generative Control Plane for AI Network Simulation
Network simulation plays a critical role in improving the efficiency of AI supercomputers for new design validation, parameter tuning, and network protocol development. However, high-fidelity network simulation is very slow at scale. We observe that the significantly inefficient initialization of the network control plane is one of the key reasons for the slow simulation speed. The existing network simulation searches for the available routing at simulation initialization, which takes hours or even days. Moreover, the large routing table involves redundant information which occupy a high memory volume and makes routing lookup very slow. In this paper, we present Nüwa 1 (opens in new tab), an efficient generative control plane for AI network simulation. Nüwa leverages the layered network architecture of the AI network to express routing information in a formula for each layer. The formulas are generated directly from the topology description with an extremely simple transformation. Evaluations show that Nüwa can reduce routing table generation from hours to only 20 seconds for 64K nodes. For data plane execution, Nüwa can reduce the overall simulation time over 100x by almost eliminating the forwarding calculation.