Do not store input activations when not computing weight gradients (#739)
* Do not store input activations when not computing weight gradients Signed-off-by:Sangkug Lym <slym@nvidia.com> * fix userbuffer tp comm overlap case Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Showing
Please register or sign in to comment