Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (#861)
* Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in https://github.com/microsoft/DeepSpeed/issues/707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by:Olatunji Ruwase <olruwase@microsoft.com>
Showing
Please register or sign in to comment