[feat] ShardedOptim: Distributed Grad Scaler (for torch AMP) (#182)
* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea * adding stubs & explanations in the documentation
Showing
Please register or sign in to comment
* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea * adding stubs & explanations in the documentation