• Min Xu's avatar
    [feat] sync adascale from internal repo, support add_param_group (#266) · 3932a1f6
    Min Xu authored
    * [feat] sync adascale from internal repo
    
    - tbd
    
    testing: tbd
    
    * Update argument document of __init__
    
    * update documentation around set_num_gradients_to_accumulate
    
    * added checking code for proper API calling places
    
    * rename internal APIs to make them internal
    
    * updated changelog
    
    * added support for add_param_group and its unit test
    
    * added unit test for set_num_gradients_to_accumulate
    
    * added debias_ewma unit test
    
    * fixed test_set_num_gradients_to_accumulate (need zero_grad() call)
    
    * added missing zero_grad() to test_lr_scheduler
    
    * fixed test_add_param_group with respect to optim.zero_grad()
    
    * added test_gradient_value
    
    * added test_scale_not_equal_default for scale != world_size * grad_accum
    
    * added test_unhook()
    
    * removed print statements
    
    * fixed a typo
    
    * addressed Ben's comment
    3932a1f6
test_single_node_adascale.py 14.9 KB