Efficient Distributed Orthonormal Optimizers for Large-Scale Training
- Kwangjun Ahn, Microsoft
Kwangjun delivered a 50-minute technical talk on recent advances in orthonormal update methods for large-scale AI model training. This topic has been rapidly gaining attention in the community, emerging as a strong successor to AdamW following the success of orthonormal optimizers in training production-scale models such as Kimi-K2 and GLM-4.5.
The talk centered on the design and practice of orthonormal updates, with a focus on optimizers such as Muon and Dion2. While I briefly discussed their theoretical foundations, the emphasis was on practical usage: how to integrate these optimizers into modern training pipelines, interpret their algorithmic components, and leverage the implementation guidelines provided in our open-source codebase on GitHub (opens in new tab).
-
-
Kwangjun Ahn
Senior Researcher
-
-
次を見る
-
Dion2: A new simple method to shrink matrix in Muon
- Anson Ho,
- Kwangjun Ahn
-
-
-
-
-
-