1. 12 Feb, 2021 1 commit
  2. 05 Feb, 2021 1 commit
  3. 03 Feb, 2021 2 commits
    • Benjamin Lefaudeux's avatar
      [chore] disheartening switch off of a OSS cpu test (#356) · 011c0c41
      Benjamin Lefaudeux authored
      * precise skip, only if agent has only cpu
      011c0c41
    • Min Xu's avatar
      [feat] Add AdaScaleWrapper (#347) · a2408eb8
      Min Xu authored
      * [feat] Add AdaScaleWrapper
      
      - This enables a different API for wrapping an optimizer with AdaScale.
      - This also enables AdaScale to be wrapped by OSS.
      - However, OSS wrapping AdaScale results in different optimization,
        which future research will be needed to study its effects.
      
      testing: add unit tests.
      
      * addressed comment: typo
      a2408eb8
  4. 02 Feb, 2021 1 commit
  5. 29 Jan, 2021 1 commit
    • Min Xu's avatar
      [test]: test with py39 + torch 1.8 nightly (#339) · e348806b
      Min Xu authored
      * [test]: test with py39 + torch 1.8 nightly
      
      * version fix
      
      * more fix
      
      * fix version function for nightly version
      
      * fix torch_pg build
      
      * invalidate cache
      
      * separate benchmark requirements
      
      * comment
      
      * fixed mypy
      
      * fixed a test
      e348806b
  6. 28 Jan, 2021 1 commit
    • Min Xu's avatar
      [test]: test adascale with oss (#328) · fa11d338
      Min Xu authored
      * [test]: test adascale with oss
      
      * minor fix
      
      * add a small comment
      
      * refactor: moved find_tensor_by_shape
      
      * refactor: move test golden data into its own module
      
      * refactor: simplied the train function
      
      * refactor: added comments as suggested
      fa11d338
  7. 27 Jan, 2021 1 commit
  8. 20 Jan, 2021 1 commit
  9. 11 Jan, 2021 1 commit
  10. 08 Jan, 2021 3 commits
  11. 05 Jan, 2021 1 commit
    • Benjamin Lefaudeux's avatar
      [fix] Flaky tests (#283) · 79365ee6
      Benjamin Lefaudeux authored
      * adding the pytest timeout plugin to properly root out hanging tests
      * removing redundant code, slightly more reasonable timeout, works on single cuda
      * finding the root bug for some of the cpu hangs, rpc init
      * propagating all the rpc init test changes to the pipe and model parallel tests
      79365ee6
  12. 04 Jan, 2021 1 commit
    • Min Xu's avatar
      [feat] sync adascale from internal repo, support add_param_group (#266) · 3932a1f6
      Min Xu authored
      * [feat] sync adascale from internal repo
      
      - tbd
      
      testing: tbd
      
      * Update argument document of __init__
      
      * update documentation around set_num_gradients_to_accumulate
      
      * added checking code for proper API calling places
      
      * rename internal APIs to make them internal
      
      * updated changelog
      
      * added support for add_param_group and its unit test
      
      * added unit test for set_num_gradients_to_accumulate
      
      * added debias_ewma unit test
      
      * fixed test_set_num_gradients_to_accumulate (need zero_grad() call)
      
      * added missing zero_grad() to test_lr_scheduler
      
      * fixed test_add_param_group with respect to optim.zero_grad()
      
      * added test_gradient_value
      
      * added test_scale_not_equal_default for scale != world_size * grad_accum
      
      * added test_unhook()
      
      * removed print statements
      
      * fixed a typo
      
      * addressed Ben's comment
      3932a1f6
  13. 29 Dec, 2020 1 commit
  14. 22 Dec, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [OSS] Balance the trainable params only (#262) · c386e937
      Benjamin Lefaudeux authored
      * fix, one liner
      
      * adjust so that frozen trunks get spread still, even if this should have little consequences
      
      * removing dead code, hopeful unit test fix
      
      * now with some linting..
      
      * adding a proper unit test case
      c386e937
  15. 16 Dec, 2020 1 commit
    • Min Xu's avatar
      [feat]: AdaScale work with lr_scheduler and tests, examples (#229) · d65cd838
      Min Xu authored
      * [doc]: AdaScale example and notes
      
      * formatted notes correctly as suggested by Benjamin
      
      * added feature and unit test to make sure lr_scheduler works
      
      * update the example with lr_scheduler
      
      * fixed doc with "make html"
      
      * addressed Mike's suggestions
      d65cd838
  16. 14 Dec, 2020 1 commit
  17. 06 Dec, 2020 1 commit
  18. 03 Dec, 2020 1 commit
    • Min Xu's avatar
      [feat] AdaScale: Gradient Accumulation and Add PyTest unit tests (#202) · ce5860ea
      Min Xu authored
      * added AdaScale to README
      
      * [adascale] added gradient accumulation
      
      - added gradient accumulation
      - tested with cifar full trainings with different value of accumulation
      and verified the full accuracy is obtained
      - also removed the patch optimize flag until we need it
      
      * [adascale] adding pytest
      
      - added basic and ddp tests and grad_accum
      - closes #195
      
      * added changelog
      
      * added ddp grad_accum test
      
      * moved ddp and non-ddp tests into separate files
      
      * added checkpoint test
      
      * more doc
      
      * addressed Mike's comments
      ce5860ea
  19. 16 Nov, 2020 1 commit
  20. 06 Nov, 2020 1 commit
  21. 28 Oct, 2020 1 commit
  22. 14 Oct, 2020 2 commits
  23. 08 Oct, 2020 1 commit
  24. 15 Sep, 2020 2 commits
  25. 09 Sep, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS flatten state dict (#65) · 4f597233
      Benjamin Lefaudeux authored
      Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
      4f597233
  26. 08 Sep, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS: Sync all attributes (#67) · 5a268b25
      Benjamin Lefaudeux authored
      Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that. 
      5a268b25
  27. 03 Sep, 2020 2 commits
    • Jun Ru Anderson's avatar
      Add grad scaler (#48) · b6a5e634
      Jun Ru Anderson authored
      
      
      Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      b6a5e634
    • Benjamin Lefaudeux's avatar
      [fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea
      Benjamin Lefaudeux authored
      * Aligning the optimizer state dict with what PyTorch expects
      
      * Adding a check on the dict keys, ensure that `state` and `param_groups` are there
      
      * after installing the specific isort, black and all, one liner to please the linter..
      1d1d15ea
  28. 28 Aug, 2020 1 commit
  29. 27 Aug, 2020 3 commits
  30. 22 Aug, 2020 1 commit
  31. 21 Aug, 2020 1 commit
  32. 20 Aug, 2020 1 commit