1. 31 May, 2022 1 commit
  2. 26 May, 2022 1 commit
  3. 30 Mar, 2022 1 commit
  4. 14 Feb, 2022 1 commit
    • Min Xu's avatar
      [chore] [cleanup]: pytest, pytorch new versions, fix tests (#933) · fae29959
      Min Xu authored
      
      
      * update pytest versions
      
      * [test] test related changes
      
      - upgrade to newer pytorch versions
      - added function to make test more deterministic on A100 and TF32
      - fixed some tests so that they are correctly skipped on a single GPU system
      
      * more fixes
      
      * formatting overly long lines
      
      * format
      
      * better test without trigger a warning
      
      * fix an optim state bug with newer pytorch
      
      - adam optimizer seems to return "step" as a singleton tensor now in the
      nightly build
      - this fixes it assumeing non-tensor value can still be loaded back by
      the optimizer
      
      * improve oss.py
      
      - use min_loss for regression checking is a bit more reliable
      - also increased the num epochs from 10 to 12
      
      * small oss.py fix
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      fae29959
  5. 11 Feb, 2022 1 commit
  6. 14 Jan, 2022 1 commit
  7. 13 Jan, 2022 1 commit
    • Anupam Bhatnagar's avatar
      [feature] [experimental] Layerwise Gradient Scaler (#879) · 52d066a2
      Anupam Bhatnagar authored
      * [skip ci] first commit
      
      * [skip ci] gradient scaler example
      
      * [skip ci] adding feed forward toy example
      
      * [skip ci] adding types
      
      * [skip ci] adding backward hook
      
      * [skip ci] update
      
      * [skip ci] working feed forward example
      
      * [skip ci] working feed forward example
      
      * [skip ci] use named_modules instead of named_children
      
      * [skip ci] adding new file
      
      * [skip ci] clean up
      
      * [skip ci] implement unscale function
      
      * [skip ci] implement unscale function
      
      * [skip ci] removing old file
      
      * [skip ci] removing some more old files
      
      * [skip ci] making unscale function generic
      
      * [skip ci] adding test for vision model
      
      * [skip ci] adding identity layer
      
      * [skip ci] cleanup files
      
      * [skip ci] refactoring
      
      * [skip ci] more refactoring
      
      * [skip ci] added functionality to update scale
      
      * [skip ci] data loader clean up
      
      * [skip ci] implemented inf checks and update scale functions
      
      * [skip ci]code clean up. added...
      52d066a2
  8. 12 Nov, 2021 1 commit
    • Anupam Bhatnagar's avatar
      Setup pre-commit github action and apply pre-commit to all files (#849) · 7d7edf6d
      Anupam Bhatnagar authored
      * adding pre-commit files
      
      * applying pre-commit to all files
      
      * adding no-strict-optional argument to mypy in circle ci config
      
      * fix typo
      
      * updating python versions
      
      * [skip ci] remove extra args
      
      * adding python 3.9
      
      * [skip ci] set pre-commit version in requirements-dev.txt
      
      * set CACHE_VERSION
      
      * move linters from circleci to github actions
      
      * update python version
      
      * update python version in benchmarks_2
      
      * moving to python 3.9.7
      7d7edf6d
  9. 10 Sep, 2021 1 commit
  10. 06 Sep, 2021 1 commit
    • Min Xu's avatar
      [cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup;... · 3ecf76f4
      Min Xu authored
      
      [cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744)
      
      * changelog; mypy; oss cleanup
      
      * more broadcast_object cleanup in FSDP
      
      * one more mypy fix
      
      * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release
      
      * update torch version for LTS
      
      * minor fixes
      
      * update cache key
      
      * trying newer gpu VMs
      
      * bump the cache
      
      * update to gpu.medium, which should be 2 GPUs
      
      * update nightly version
      
      * add pre-commit instruction
      
      * fixed CHANGELOG after merging
      
      * updated to newer nightly
      
      * retained the older broadcast function for older GPUs for oss.py
      
      * fixed a bug
      
      * added a comment
      
      * fixing a test for pytorch 1.10
      
      * testing a fix
      
      * Update fairscale/optim/oss.py
      
      * Update CONTRIBUTING.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      3ecf76f4
  11. 27 Jul, 2021 1 commit
  12. 26 Jun, 2021 1 commit
  13. 08 May, 2021 1 commit
  14. 06 Apr, 2021 1 commit
  15. 05 Apr, 2021 1 commit
  16. 04 Apr, 2021 1 commit
  17. 19 Mar, 2021 1 commit
  18. 18 Mar, 2021 1 commit
  19. 17 Mar, 2021 1 commit
  20. 15 Mar, 2021 1 commit
  21. 12 Mar, 2021 1 commit
  22. 11 Mar, 2021 1 commit
  23. 09 Mar, 2021 1 commit
  24. 05 Mar, 2021 1 commit
  25. 04 Mar, 2021 1 commit
    • Min Xu's avatar
      [test] AdaScale & SDP/FSDP (#468) · efed9cee
      Min Xu authored
      - cover them in terms of code path only
      - numerically, AdaScale is different on SDP/FSDP than DDP, mainly
        due to partial view of the gradients.
      - this doesn't mean it is definitely not useful but it is yet to
        be validated.
      - not going to spend too much time until we have a real use case.
      efed9cee
  26. 23 Feb, 2021 1 commit
    • Myle Ott's avatar
      Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e
      Myle Ott authored
      Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336
      
      ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.
      
      Compared to PyTorch DDP:
      * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
      * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
      * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
          * all-gather parameters at start of forward pass and start of backward pass
          * reduce-scatter grads at end of backward pass
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      15512d9e
  27. 22 Feb, 2021 1 commit
  28. 19 Feb, 2021 1 commit
  29. 14 Feb, 2021 1 commit
  30. 12 Feb, 2021 1 commit
  31. 05 Feb, 2021 1 commit
  32. 03 Feb, 2021 2 commits
    • Benjamin Lefaudeux's avatar
      [chore] disheartening switch off of a OSS cpu test (#356) · 011c0c41
      Benjamin Lefaudeux authored
      * precise skip, only if agent has only cpu
      011c0c41
    • Min Xu's avatar
      [feat] Add AdaScaleWrapper (#347) · a2408eb8
      Min Xu authored
      * [feat] Add AdaScaleWrapper
      
      - This enables a different API for wrapping an optimizer with AdaScale.
      - This also enables AdaScale to be wrapped by OSS.
      - However, OSS wrapping AdaScale results in different optimization,
        which future research will be needed to study its effects.
      
      testing: add unit tests.
      
      * addressed comment: typo
      a2408eb8
  33. 02 Feb, 2021 1 commit
  34. 29 Jan, 2021 1 commit
    • Min Xu's avatar
      [test]: test with py39 + torch 1.8 nightly (#339) · e348806b
      Min Xu authored
      * [test]: test with py39 + torch 1.8 nightly
      
      * version fix
      
      * more fix
      
      * fix version function for nightly version
      
      * fix torch_pg build
      
      * invalidate cache
      
      * separate benchmark requirements
      
      * comment
      
      * fixed mypy
      
      * fixed a test
      e348806b
  35. 28 Jan, 2021 1 commit
    • Min Xu's avatar
      [test]: test adascale with oss (#328) · fa11d338
      Min Xu authored
      * [test]: test adascale with oss
      
      * minor fix
      
      * add a small comment
      
      * refactor: moved find_tensor_by_shape
      
      * refactor: move test golden data into its own module
      
      * refactor: simplied the train function
      
      * refactor: added comments as suggested
      fa11d338
  36. 27 Jan, 2021 1 commit
  37. 20 Jan, 2021 1 commit
  38. 11 Jan, 2021 1 commit
  39. 08 Jan, 2021 1 commit