Oct 17, 2025/17 mins readDistributed Muon: Custom Gradient Synchronization for Memory-Efficient Training