• Changyu Gao's avatar
    Fix gradient accumulation (#1086) · f5e727cc
    Changyu Gao authored
    * Fix gradient accumulation
    
    - Add ``is_scaled_loss`` flag to support both scaled / unscaled loss
    - Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly
    - Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate``
    - Add tests for gradient
    f5e727cc
test_single_node_adascale.py 15.7 KB