1. 24 Sep, 2022 1 commit
  2. 02 May, 2022 1 commit
    • Paul Johnson's avatar
      [FSDP] ssd_offload fixing backward path (grad_fn) for SsdFlatParameter and... · 51b53ddb
      Paul Johnson authored
      [FSDP] ssd_offload fixing backward path (grad_fn) for SsdFlatParameter and SsdFlatParameterView (#974)
      
      * [FSDP] fixing backward path for SsdFlatParameter and SsdFlatParameterView when overriding .data
      
      * Get ssd_offload unit tests passing
      
      * [FSDP] get all test_fsdp_offload tests passing w/ ssd_offload on
      
      * Update changelog
      51b53ddb
  3. 06 Apr, 2022 1 commit
  4. 09 Mar, 2022 1 commit
  5. 08 Mar, 2022 1 commit
  6. 03 Mar, 2022 1 commit
  7. 23 Feb, 2022 1 commit
  8. 15 Feb, 2022 1 commit
  9. 28 Jan, 2022 1 commit
  10. 14 Jan, 2022 1 commit
  11. 13 Jan, 2022 2 commits
    • Anupam Bhatnagar's avatar
      [feature] [experimental] Layerwise Gradient Scaler (#879) · 52d066a2
      Anupam Bhatnagar authored
      * [skip ci] first commit
      
      * [skip ci] gradient scaler example
      
      * [skip ci] adding feed forward toy example
      
      * [skip ci] adding types
      
      * [skip ci] adding backward hook
      
      * [skip ci] update
      
      * [skip ci] working feed forward example
      
      * [skip ci] working feed forward example
      
      * [skip ci] use named_modules instead of named_children
      
      * [skip ci] adding new file
      
      * [skip ci] clean up
      
      * [skip ci] implement unscale function
      
      * [skip ci] implement unscale function
      
      * [skip ci] removing old file
      
      * [skip ci] removing some more old files
      
      * [skip ci] making unscale function generic
      
      * [skip ci] adding test for vision model
      
      * [skip ci] adding identity layer
      
      * [skip ci] cleanup files
      
      * [skip ci] refactoring
      
      * [skip ci] more refactoring
      
      * [skip ci] added functionality to update scale
      
      * [skip ci] data loader clean up
      
      * [skip ci] implemented inf checks and update scale functions
      
      * [skip ci]code clean up. added...
      52d066a2
    • tmarkstrum's avatar
      [Fix][FSDP]fixed padding size of input tensor for reduce scatter (#907) · fb4eca19
      tmarkstrum authored
      
      
      * fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      
      * added changelog
      
      * fixed some commit.
      
      * added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size.
      
      * throw an error instead of rolling back to use default process group for reduce_scatter_process_group
      
      * Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group"
      
      This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9.
      
      * added check for None to avoid unit test failure
      
      * fixed an error to avoid the unit tests failure
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      fb4eca19
  12. 12 Jan, 2022 1 commit
  13. 06 Jan, 2022 1 commit
    • four4fish's avatar
      FullyShardedDataParallel: only return full state dict on rank 0 (#885) · d3417ceb
      four4fish authored
      * FullyShardedDataParallel: only return full state dict on rank 0
      
      * Add flag and make rank 0 only optional
      
      * Add tests
      
      * Add docs
      
      * address comments
      
      * update comments
      
      * update torch nightly version
      
      * update torchvision number for torch nightly dependence
      
      * add changelog
      
      * Update CHANGELOG.md
      
      * Update CHANGELOG.md
      d3417ceb
  14. 05 Jan, 2022 1 commit
    • Paul Johnson's avatar
      Enabling ssd_offload training basic tests. (#887) · c5e471bc
      Paul Johnson authored
      * Enabling ssd_offload training and test via tests/nn/data_parallel/test_fsdp_offload.py.
      * Removed unused classes: SsdBuffer, SsdTensorHandleView, SsdParameter, SsdTensor
      * Enhance test coverage of test_ssd_offloading_train_flatten_params_wrapper
      * Modifications from PR #887 review comments.
      * Update Changelog
      c5e471bc
  15. 21 Dec, 2021 3 commits
  16. 02 Dec, 2021 1 commit
    • Min Xu's avatar
      [fix] [FSDP] Do not lose original reshard_after_forward (#880) · 7c2c3e00
      Min Xu authored
      * [fix] [FSDP] Do not lose original reshard_after_forward
      
      - In a corner case we can lose this value
      - Saving it and use it in the reset function fixed it
      - A trivial case probably not worth a dedicated test for now
      
      * added changelog
      7c2c3e00
  17. 18 Nov, 2021 2 commits
  18. 17 Nov, 2021 2 commits
  19. 12 Nov, 2021 1 commit
    • Anupam Bhatnagar's avatar
      Setup pre-commit github action and apply pre-commit to all files (#849) · 7d7edf6d
      Anupam Bhatnagar authored
      * adding pre-commit files
      
      * applying pre-commit to all files
      
      * adding no-strict-optional argument to mypy in circle ci config
      
      * fix typo
      
      * updating python versions
      
      * [skip ci] remove extra args
      
      * adding python 3.9
      
      * [skip ci] set pre-commit version in requirements-dev.txt
      
      * set CACHE_VERSION
      
      * move linters from circleci to github actions
      
      * update python version
      
      * update python version in benchmarks_2
      
      * moving to python 3.9.7
      7d7edf6d
  20. 08 Nov, 2021 3 commits
  21. 05 Nov, 2021 1 commit
    • Min Xu's avatar
      [feat] experimental MEVO layer (#840) · 8347c1a2
      Min Xu authored
      
      
      * [feat] MEVO kernel
      
      - initial import from min/softmax and min/testing branches
      - need to rename and further cleanup
      
      * only test with newer pytorch
      
      * renamed and added comments and code cleanup
      
      * rename and reduce test memory
      
      * testing
      
      * minor fixing
      
      * fixing
      
      * more fix
      
      * changelog
      
      * more 1.7 and 1.8 paper cuts
      
      * remove dead code
      
      * addressed Benjamin's comments
      
      * addressed more comments
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      8347c1a2
  22. 01 Nov, 2021 1 commit
    • Min Xu's avatar
      [feat] [FSDP]: add experimental support to shared weights (#836) · f2af4c66
      Min Xu authored
      
      
      * added a new test, passing without shared weights
      
      * tested weight sharing
      
      * added the test to test list file
      
      * extended to world_size = 2
      
      * fixed test
      
      * [feat]: add limited and experimental support for shared parameter
      
      * fixed tests
      
      * simplify to work with layer with at least 1 non-shared params and add code to pick up linked_param field for sharding the shared param
      
      * fixed the case where linked param is not in separate FSDP
      
      * changelog and remove old code
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      f2af4c66
  23. 27 Oct, 2021 1 commit
  24. 20 Oct, 2021 1 commit
  25. 20 Sep, 2021 1 commit
  26. 13 Sep, 2021 1 commit
  27. 12 Sep, 2021 1 commit
  28. 05 Sep, 2021 1 commit
  29. 12 Aug, 2021 2 commits
  30. 01 Aug, 2021 1 commit
  31. 31 Jul, 2021 1 commit
  32. 27 Jul, 2021 1 commit