1. 12 Aug, 2021 1 commit
    • anj-s's avatar
      [FSDP][feature] Support returning the original parameter names after a model... · a825348d
      anj-s authored
      [FSDP][feature] Support returning the original parameter names after a model has been wrapped with FSDP (#755)
      
      * checkpoint work
      
      * fix lint issues
      
      * remove debug statement
      
      * remove print
      
      * fix lint errors
      
      * fix lint errors
      
      * fix lint errors
      
      * add comments and fix lint errors
      
      * modified comments and tests
      a825348d
  2. 10 Aug, 2021 1 commit
    • Rahul Iyer's avatar
      Fix pre-commit hook failures (#756) · 31d600cc
      Rahul Iyer authored
      Pre-commit hook fails when run on all files for three reasons:
      (see trace below)
      
      1. Trailing whitespace on multiple files
      2. mypy fails to load numpy and then subsequently fails to load
      LazyModule from pipe.py
      3. isort sees issues with known_third_party packages
      
      ```
      > pre-commit run --all-files
      
      Trim Trailing Whitespace.................................................Failed
      - hook id: trailing-whitespace
      - exit code: 1
      - files were modified by this hook
      
      Fixing docs/source/conf.py
      Fixing fairscale/experimental/nn/auto_shard.py
      Fixing docs/source/deep_dive/activation_checkpointing.rst
      Fixing docs/source/tutorials/pipe.rst
      Fixing docs/source/installation_instructions.rst
      Fixing docs/source/deep_dive/pipeline_parallelism.rst
      Fixing docs/source/tutorials/activation_checkpointing.rst
      Fixing docs/source/tutorials/offload_model.rst
      Fixing docs/source/deep_dive/oss_sdp_fsdp.rst
      Fixing docs/source/what_is_fairscale.rst
      Fixing CHANGELOG.md
      Fixing fairscale/experimental/nn/offload.py
      Fixing docs/source/index.rst
      Fixing docs/source/deep_dive/adascale.rst
      Fixing README.md
      Fixing docs/source/tutorials/oss.rst
      Fixing docs/source/deep_dive/offload.rst
      
      Check python ast.........................................................Passed
      Check for merge conflicts................................................Passed
      Don't commit to branch...................................................Passed
      Check for added large files..............................................Passed
      Fix End of Files.........................................................Failed
      - hook id: end-of-file-fixer
      - exit code: 1
      - files were modified by this hook
      
      Fixing requirements.txt
      Fixing docs/source/getting_started.rst
      Fixing docs/source/installation_instructions.rst
      Fixing codecov.yml
      Fixing docs/source/deep_dive/adascale.rst
      Fixing docs/source/tutorials/oss.rst
      Fixing docs/source/deep_dive/offload.rst
      
      black....................................................................Passed
      flake8...................................................................Passed
      seed isort known_third_party.............................................Failed
      - hook id: seed-isort-config
      - exit code: 1
      - files were modified by this hook
      isort....................................................................Passed
      mypy.....................................................................Failed
      - hook id: mypy
      - exit code: 2
      
      setup.cfg:45: error: Error importing plugin 'numpy.typing.mypy_plugin': No module named 'numpy'
      Found 1 error in 1 file (checked 197 source files)
      ```
      31d600cc
  3. 02 Aug, 2021 2 commits
  4. 01 Aug, 2021 1 commit
  5. 31 Jul, 2021 1 commit
  6. 30 Jul, 2021 1 commit
    • Yanli Zhao's avatar
      [FSDP] Move final backward callback queueing to pre-backward hook of root instance (#753) · ba7df621
      Yanli Zhao authored
      Move final backward callback to pre-backward hook of root FSDP instance
      
      Summary:
      
      Move final backward callback to pre-backward hook of root FSDP instance,
      so that it is always attached to the outer most backward call and fired
      after all backward calls are completed.
      
      Also added flags to check final backward callback is fired when final
      backward callback is required.
      
      If root FSDP is checkpointed and called multiple times in forward,
      check pointer counter is used to make sure final backward callback is queued inside last inner backward
      call as well.
      
      Test Plan: unit tests
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * reformat
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * nits and unit tests
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * address some comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * replace m with self
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * reformat
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * nits
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * remove the fired flag
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * assert state on root only
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      ba7df621
  7. 27 Jul, 2021 2 commits
  8. 26 Jul, 2021 1 commit
    • Min Xu's avatar
      [feat]: prepare FSDP to handle multiple flatten params and fixed metadata saving for MoE (#746) · 83b0b49e
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 3: make FSDP use FlattenParamModule unconditionally
      
      * fixing the auto_wrap tests
      
      * minor
      
      * rewrite local_metadata_dict
      
      - updated FPW so that custom flat param name is also supported
      
      * bug fix
      
      * mypy
      
      * rewrote consolidate_shard_weights
      
      - test_consolidate passes
      
      * comments
      
      * fixing pickling
      
      * Fix shared params and MoE logic (#749)
      
      * add strict kwarg to support fairseq:gshard MoE saving logic
      
      * Test fairseq style shard
      
      * style
      
      * formatting and address comments
      
      * added changelog
      
      * fixing a test after padding renaming
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      83b0b49e
  9. 19 Jul, 2021 1 commit
  10. 12 Jul, 2021 2 commits
  11. 07 Jul, 2021 1 commit
  12. 28 Jun, 2021 4 commits
  13. 26 Jun, 2021 2 commits
  14. 25 Jun, 2021 3 commits
  15. 23 Jun, 2021 1 commit
  16. 22 Jun, 2021 1 commit
    • Pavel Belevich's avatar
      Update torch to 1.9.0 release (#717) · 1cc4c837
      Pavel Belevich authored
      * Update torch to 1.9.0.dev20210614+cu102
      
      * Update config.yml
      
      * Update config.yml
      
      * Update setup.py
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      1cc4c837
  17. 21 Jun, 2021 1 commit
    • Min Xu's avatar
      [feat] FSDP: supporting multiple flatten parameter groups (#711) · ab71efb3
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 2: extending FPW to support multiple flat params groups
      - FSDP still only use one group
      - unit test does this the new code paths
      - updated the changelog
      
      * first cut, mypy passed
      
      * test_flatten_params_wrapper.py::TestFlattenParams tests pass
      
      * added two more test cases and fixed a case in the code
      
      * fixed one bug with param_path_infos
      
      * fixed two more tests with hardcoded flat_param names
      
      * Update CHANGELOG.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      ab71efb3
  18. 14 Jun, 2021 1 commit
  19. 11 Jun, 2021 3 commits
    • anj-s's avatar
      [Offload][feature] Add auto shard functionality to remove requirement of... · cbeda830
      anj-s authored
      [Offload][feature] Add auto shard functionality to remove requirement of nn.Sequential models. (#695)
      
      * auto wrap functionality
      
      * lint and doc strings
      
      * fix lint errors
      
      * lint errors and version skips
      
      * remove mypy checking and add conditional import
      
      * another math.prod instance
      
      * another import fix
      
      * address comments
      
      * lint errors
      
      * address comments
      
      * fix lint errors
      
      * add placeholder nodes to tracker list
      cbeda830
    • anj-s's avatar
      remove examples dir (#712) · 7bdb9a7f
      anj-s authored
      7bdb9a7f
    • Pete's avatar
      Use original forward pass directly when in eval mode from within checkpoint wrapper (#709) · 370b8483
      Pete authored
      * add failing test
      
      * add fix
      
      * use 'torch.is_grad_enabled()' instead of 'module.training'
      
      * Revert "add failing test"
      
      This reverts commit 1c34242208f9b2c5fa6c8f181434c2be6d7cdbc0.
      
      * add simple test
      
      * improve test
      
      * add check for fwd_counter
      
      * revert typing/format changes
      
      * move to new test file
      
      * CHANGELOG
      
      * remove old test
      
      * fix import order
      
      * fix test to be compat with torch 1.6.0
      
      * clean up
      
      * comments
      
      * isort 🤦
      370b8483
  20. 08 Jun, 2021 1 commit
  21. 01 Jun, 2021 3 commits
  22. 28 May, 2021 2 commits
  23. 27 May, 2021 3 commits
  24. 26 May, 2021 1 commit