1. 28 Apr, 2021 2 commits
    • msbaines's avatar
      2bb2a134
    • Min Xu's avatar
      [feat] save memory by using bucket buffer only in backward (#633) · a5594032
      Min Xu authored
      
      
      * [feat] save memory by using bucket buffer only in backward
      
      - this fixes bug #627
      - added documentation to clarify the buffer's cost and speed/memory
        tradeoff
      - added setup/teardown calls so that the buffer is only allocated
        during the backward pass, saving more memory for forward and stepping
        so that they can be used for things like activations.
      - added a unit test that assert the memory is in range.
      
      Comparing with DDP:
      
        1. buffer size scales with # of FSDP not model size
        2. buffer is only allocated during backward
        3. buffer is used for small tensors only to reduce overhead
        4. overlapping of compute-reduction is very different
      
      * add PR number to changelog
      
      * filled in with memory number on 1.9
      
      * addressed comments
      
      * update comments
      
      * fix for 1.6
      
      * add a todo
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      a5594032
  2. 26 Apr, 2021 1 commit
  3. 19 Apr, 2021 1 commit
  4. 13 Apr, 2021 1 commit
  5. 02 Apr, 2021 1 commit
  6. 18 Mar, 2021 3 commits
  7. 12 Mar, 2021 1 commit
  8. 11 Mar, 2021 1 commit
  9. 09 Mar, 2021 1 commit
  10. 25 Feb, 2021 1 commit
  11. 23 Feb, 2021 6 commits
  12. 22 Feb, 2021 1 commit
  13. 19 Feb, 2021 1 commit
  14. 18 Feb, 2021 1 commit
  15. 17 Feb, 2021 1 commit
  16. 12 Feb, 2021 1 commit
  17. 11 Feb, 2021 1 commit
  18. 03 Feb, 2021 1 commit
  19. 02 Feb, 2021 1 commit
  20. 29 Jan, 2021 1 commit
  21. 07 Jan, 2021 1 commit
  22. 05 Jan, 2021 1 commit
  23. 04 Jan, 2021 2 commits
    • Benjamin Lefaudeux's avatar
      [chore] 0.1.2 version bump (#285) · a21f50f9
      Benjamin Lefaudeux authored
      a21f50f9
    • Min Xu's avatar
      [feat] sync adascale from internal repo, support add_param_group (#266) · 3932a1f6
      Min Xu authored
      * [feat] sync adascale from internal repo
      
      - tbd
      
      testing: tbd
      
      * Update argument document of __init__
      
      * update documentation around set_num_gradients_to_accumulate
      
      * added checking code for proper API calling places
      
      * rename internal APIs to make them internal
      
      * updated changelog
      
      * added support for add_param_group and its unit test
      
      * added unit test for set_num_gradients_to_accumulate
      
      * added debias_ewma unit test
      
      * fixed test_set_num_gradients_to_accumulate (need zero_grad() call)
      
      * added missing zero_grad() to test_lr_scheduler
      
      * fixed test_add_param_group with respect to optim.zero_grad()
      
      * added test_gradient_value
      
      * added test_scale_not_equal_default for scale != world_size * grad_accum
      
      * added test_unhook()
      
      * removed print statements
      
      * fixed a typo
      
      * addressed Ben's comment
      3932a1f6
  24. 30 Dec, 2020 1 commit
  25. 24 Dec, 2020 1 commit
    • Min Xu's avatar
      [chore] Update changelog (#268) · 18455bf0
      Min Xu authored
      * Update changelog
      
      missed this item from previous AdaScale commit.
      
      * More change log
      
      * Addressed review comments
      18455bf0
  26. 03 Dec, 2020 1 commit
    • Min Xu's avatar
      [feat] AdaScale: Gradient Accumulation and Add PyTest unit tests (#202) · ce5860ea
      Min Xu authored
      * added AdaScale to README
      
      * [adascale] added gradient accumulation
      
      - added gradient accumulation
      - tested with cifar full trainings with different value of accumulation
      and verified the full accuracy is obtained
      - also removed the patch optimize flag until we need it
      
      * [adascale] adding pytest
      
      - added basic and ddp tests and grad_accum
      - closes #195
      
      * added changelog
      
      * added ddp grad_accum test
      
      * moved ddp and non-ddp tests into separate files
      
      * added checkpoint test
      
      * more doc
      
      * addressed Mike's comments
      ce5860ea
  27. 02 Dec, 2020 1 commit
  28. 01 Dec, 2020 1 commit
  29. 15 Oct, 2020 1 commit
  30. 28 Aug, 2020 1 commit
  31. 31 Jul, 2020 1 commit