1. 14 Feb, 2022 1 commit
    • Min Xu's avatar
      [chore] [cleanup]: pytest, pytorch new versions, fix tests (#933) · fae29959
      Min Xu authored
      
      
      * update pytest versions
      
      * [test] test related changes
      
      - upgrade to newer pytorch versions
      - added function to make test more deterministic on A100 and TF32
      - fixed some tests so that they are correctly skipped on a single GPU system
      
      * more fixes
      
      * formatting overly long lines
      
      * format
      
      * better test without trigger a warning
      
      * fix an optim state bug with newer pytorch
      
      - adam optimizer seems to return "step" as a singleton tensor now in the
      nightly build
      - this fixes it assumeing non-tensor value can still be loaded back by
      the optimizer
      
      * improve oss.py
      
      - use min_loss for regression checking is a bit more reliable
      - also increased the num epochs from 10 to 12
      
      * small oss.py fix
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      fae29959
  2. 26 Jun, 2021 1 commit
  3. 13 May, 2021 1 commit
    • Min Xu's avatar
      [fix] add and use get_process_group_cached (#678) · bde4bac5
      Min Xu authored
      * [fix] add and use get_process_group_cached
      
      - This commit makes FSDP avoid making too many process groups by default
      - Extra process group is bad for GPU memory and init time
      
      * add changelog
      
      * lint
      
      * note on speed
      
      * add better assert output
      
      test seems to be flaky:
      https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps
      
      
      
      * update test reference memory values
      
      - With cached process groups, the memory is reduced as reported by
      pytorch as well (due to bucket buffer memory for the reduction buffer)
      - The effect on memory is actually more on the SMI memory, which is not
      reported by pytorch and checked by this test.
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update CHANGELOG.md
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * improved changelog
      
      * better handling of underscores in the md file
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      bde4bac5
  4. 08 May, 2021 1 commit
  5. 02 Apr, 2021 1 commit
  6. 11 Jan, 2021 1 commit
  7. 28 Dec, 2020 1 commit
  8. 11 Nov, 2020 2 commits
  9. 10 Nov, 2020 1 commit
    • Tom Birch's avatar
      Single-process control via PipeRPCWrapper (#156) · 5d4f50fb
      Tom Birch authored
      Adds support for:
      * Reused layers (e.g. for weight sharing)
      * Lazily-constructed layers
      * Single-process control via PipeRPCWrapper
      * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive
      
      Also added examples for multi-process and PipeRPCWrapper
      5d4f50fb
  10. 23 Oct, 2020 1 commit
  11. 21 Oct, 2020 1 commit
  12. 17 Oct, 2020 1 commit
  13. 16 Oct, 2020 2 commits
  14. 14 Oct, 2020 1 commit
  15. 08 Oct, 2020 1 commit
  16. 05 Oct, 2020 1 commit
  17. 02 Oct, 2020 1 commit