Fix gradient accumulation (#1086)
* Fix gradient accumulation - Add ``is_scaled_loss`` flag to support both scaled / unscaled loss - Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly - Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate`` - Add tests for gradient
Showing
Please register or sign in to comment