"...text-generation-inference.git" did not exist on "d14eaacacab9ca3056a9d001d0ca2dc0a36edfde"
[feat] ShardedOptim: Distributed Grad Scaler (for torch AMP) (#182)
* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea * adding stubs & explanations in the documentation
Showing
Please register or sign in to comment