1. 05 Oct, 2022 1 commit
    • Changyu Gao's avatar
      Fix gradient accumulation (#1086) · f5e727cc
      Changyu Gao authored
      * Fix gradient accumulation
      
      - Add ``is_scaled_loss`` flag to support both scaled / unscaled loss
      - Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly
      - Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate``
      - Add tests for gradient
      f5e727cc
  2. 24 Sep, 2022 1 commit
  3. 12 Jun, 2022 1 commit
  4. 04 Mar, 2021 1 commit
    • Min Xu's avatar
      [test] AdaScale & SDP/FSDP (#468) · efed9cee
      Min Xu authored
      - cover them in terms of code path only
      - numerically, AdaScale is different on SDP/FSDP than DDP, mainly
        due to partial view of the gradients.
      - this doesn't mean it is definitely not useful but it is yet to
        be validated.
      - not going to spend too much time until we have a real use case.
      efed9cee
  5. 22 Feb, 2021 1 commit
  6. 28 Jan, 2021 1 commit
    • Min Xu's avatar
      [test]: test adascale with oss (#328) · fa11d338
      Min Xu authored
      * [test]: test adascale with oss
      
      * minor fix
      
      * add a small comment
      
      * refactor: moved find_tensor_by_shape
      
      * refactor: move test golden data into its own module
      
      * refactor: simplied the train function
      
      * refactor: added comments as suggested
      fa11d338
  7. 05 Jan, 2021 1 commit
    • Benjamin Lefaudeux's avatar
      [fix] Flaky tests (#283) · 79365ee6
      Benjamin Lefaudeux authored
      * adding the pytest timeout plugin to properly root out hanging tests
      * removing redundant code, slightly more reasonable timeout, works on single cuda
      * finding the root bug for some of the cpu hangs, rpc init
      * propagating all the rpc init test changes to the pipe and model parallel tests
      79365ee6
  8. 14 Dec, 2020 1 commit
  9. 03 Dec, 2020 1 commit
    • Min Xu's avatar
      [feat] AdaScale: Gradient Accumulation and Add PyTest unit tests (#202) · ce5860ea
      Min Xu authored
      * added AdaScale to README
      
      * [adascale] added gradient accumulation
      
      - added gradient accumulation
      - tested with cifar full trainings with different value of accumulation
      and verified the full accuracy is obtained
      - also removed the patch optimize flag until we need it
      
      * [adascale] adding pytest
      
      - added basic and ddp tests and grad_accum
      - closes #195
      
      * added changelog
      
      * added ddp grad_accum test
      
      * moved ddp and non-ddp tests into separate files
      
      * added checkpoint test
      
      * more doc
      
      * addressed Mike's comments
      ce5860ea