Synthesized Collective Communication Library (SCCL)
The Synthesized Collective Communication Library is a tool for synthesizing collective algorithms tailored to a particular hardware topology. This project creates high-performance collective communication algorithms for the uncommon interconnect topologies connecting AI accelerators inside servers. This project includes the core synthesizer logic as well as routines for lowering the synthesized algorithms to backend implementations. The objectives for this project include: - Making AI workloads (both model-parallel inference and any parallel training) as well as any HPC workloads faster on the kinds of multi-GPU servers deployed in Azure. - Provide a tool for hardware designers to understand their designs by seeing how good the optimal algorithms synthesized by SCCL are. - Enabling a new class of optimizations involving custom collective primitives to be explored. Currently implementing efficient communication algorithms for custom application specific collectives is very labor intensive.