1. 12 Nov, 2021 1 commit
    • Anupam Bhatnagar's avatar
      Setup pre-commit github action and apply pre-commit to all files (#849) · 7d7edf6d
      Anupam Bhatnagar authored
      * adding pre-commit files
      
      * applying pre-commit to all files
      
      * adding no-strict-optional argument to mypy in circle ci config
      
      * fix typo
      
      * updating python versions
      
      * [skip ci] remove extra args
      
      * adding python 3.9
      
      * [skip ci] set pre-commit version in requirements-dev.txt
      
      * set CACHE_VERSION
      
      * move linters from circleci to github actions
      
      * update python version
      
      * update python version in benchmarks_2
      
      * moving to python 3.9.7
      7d7edf6d
  2. 09 Nov, 2021 1 commit
  3. 08 Nov, 2021 2 commits
  4. 05 Nov, 2021 1 commit
    • Min Xu's avatar
      [feat] experimental MEVO layer (#840) · 8347c1a2
      Min Xu authored
      
      
      * [feat] MEVO kernel
      
      - initial import from min/softmax and min/testing branches
      - need to rename and further cleanup
      
      * only test with newer pytorch
      
      * renamed and added comments and code cleanup
      
      * rename and reduce test memory
      
      * testing
      
      * minor fixing
      
      * fixing
      
      * more fix
      
      * changelog
      
      * more 1.7 and 1.8 paper cuts
      
      * remove dead code
      
      * addressed Benjamin's comments
      
      * addressed more comments
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      8347c1a2
  5. 01 Nov, 2021 2 commits
    • Min Xu's avatar
      [feat] [FSDP]: add experimental support to shared weights (#836) · f2af4c66
      Min Xu authored
      
      
      * added a new test, passing without shared weights
      
      * tested weight sharing
      
      * added the test to test list file
      
      * extended to world_size = 2
      
      * fixed test
      
      * [feat]: add limited and experimental support for shared parameter
      
      * fixed tests
      
      * simplify to work with layer with at least 1 non-shared params and add code to pick up linked_param field for sharding the shared param
      
      * fixed the case where linked param is not in separate FSDP
      
      * changelog and remove old code
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      f2af4c66
    • anj-s's avatar
      [feature] Add the low level SSD APIs (#829) · a9fcaa28
      anj-s authored
      * add doc strings
      
      * add lower level SSD APIs and tests
      
      * add the test to the list to be run
      
      * remove unused imports
      
      * more doc string changes
      
      * fix lint errors
      a9fcaa28
  6. 28 Oct, 2021 1 commit
  7. 27 Oct, 2021 4 commits
  8. 22 Oct, 2021 1 commit
    • Eugen Hotaj's avatar
      Extend auto shard capabilities to work around torch.fx edge cases. (#817) · 7bdf50a3
      Eugen Hotaj authored
      auto_shard.py currently uses torch.fx to create a symbolic DAG of
      operations and linearizes that DAG into an nn.Sequential so it can later
      be used for model offloading. This works in most cases but runs into
      issues for certain eager mode features, such as dynamic conditionals,
      shape-dependent computation, etc.
      
      This PR extends auto_shard.py to first run a preprocessing step which wraps
      any nn.Module which cannot be traced through. It adds a test for dynamic
      conditionals and updates existing failing test code.
      
      There are some immediate extensions to this approach which are marked as
      TODO in the code.
      7bdf50a3
  9. 21 Oct, 2021 1 commit
    • anj-s's avatar
      [chore] Update the PyTorch version that we run CPU tests with (#809) · 11a24161
      anj-s authored
      * update python version for cpu tess
      
      * run CPU tests with updated PyTorch version
      
      * update nightly and test PyTorch versions
      
      * skip failing multiprocess pipe test
      
      * always skip test
      
      * always skip test
      
      * always skip test
      
      * lint error
      
      * skip unsupported versions
      
      * improve skip message
      
      * lint errors
      11a24161
  10. 20 Oct, 2021 1 commit
    • Quentin Duval's avatar
      [feat] layer memory tracking (#808) · ad92220c
      Quentin Duval authored
      
      
      * [feat] layer memory tracking
      
      * [feat] layer memory tracking (add tests in CI)
      
      * [feat] layer memory tracking: doc typos
      
      * [feat] layer memory tracking: mypy fixes
      
      * [feat] layer memory tracking: fixes for FSDP all gather tracking on pytorch 1.9 and above
      
      * [feat] layer memory tracking: lint
      
      * [feat] layer memory tracking: mypy
      Co-authored-by: default avatarQuentinDuval <QuentinDuval@users.noreply.github.com>
      ad92220c
  11. 13 Sep, 2021 1 commit
  12. 12 Sep, 2021 1 commit
    • Darryl Barnhart's avatar
      [fix] FSDP intra-backwards gradient accumulation. (#784) · 4fa2ab9b
      Darryl Barnhart authored
      * [fix] FSDP intra-backwards gradient accumulation.
      
      Ensure gradient reduction accumulates into the unsharded gradient tensor
      within a backwards pass. This matters when an FSDP module is called
      multiple times within a forward pass, and reduction is _not_ deferred
      using activation checkpoint forward counters, bucketing or some other
      mechanism.
      
      Closes #780
      
      * [refactor] Remove forward counters. Comments.
      
      Removed forward counters from the activation checkpointing utility, now
      that FSDP does not require them for correct operation. Add more detailed
      comment about memory usage behaviour with gradient reduction.
      
      * [refactor] Delete deprecated forward counter usage.
      
      * [refactor] Add state assertion as end of pre-backward hook.
      4fa2ab9b
  13. 11 Sep, 2021 1 commit
    • Alex Xiao's avatar
      [feat] set requires_grad of output tensors of checkpointed modules properly (#787) · 482944d9
      Alex Xiao authored
      
      
      Before this commit, output tensors of checkpointed modules always
      require grad, even if they shouldn't. This commit makes it so that
      the outputs of checkpointed modules only require grad if either
      the input requires grad or if the parameters require grad.
      
      To achieve this, this commit also adds a new _unflattened_param_views
      attribute to modules being flattened. This allows the checkpointing
      to still access the parameters and check if gradients need to be
      computed.
      Co-authored-by: default avatarAlex Xiao <axiao@fb.com>
      482944d9
  14. 10 Sep, 2021 1 commit
  15. 07 Sep, 2021 1 commit
  16. 06 Sep, 2021 1 commit
    • Min Xu's avatar
      [cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup;... · 3ecf76f4
      Min Xu authored
      
      [cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744)
      
      * changelog; mypy; oss cleanup
      
      * more broadcast_object cleanup in FSDP
      
      * one more mypy fix
      
      * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release
      
      * update torch version for LTS
      
      * minor fixes
      
      * update cache key
      
      * trying newer gpu VMs
      
      * bump the cache
      
      * update to gpu.medium, which should be 2 GPUs
      
      * update nightly version
      
      * add pre-commit instruction
      
      * fixed CHANGELOG after merging
      
      * updated to newer nightly
      
      * retained the older broadcast function for older GPUs for oss.py
      
      * fixed a bug
      
      * added a comment
      
      * fixing a test for pytorch 1.10
      
      * testing a fix
      
      * Update fairscale/optim/oss.py
      
      * Update CONTRIBUTING.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      3ecf76f4
  17. 12 Aug, 2021 2 commits
  18. 31 Jul, 2021 1 commit
  19. 30 Jul, 2021 1 commit
    • Yanli Zhao's avatar
      [FSDP] Move final backward callback queueing to pre-backward hook of root instance (#753) · ba7df621
      Yanli Zhao authored
      Move final backward callback to pre-backward hook of root FSDP instance
      
      Summary:
      
      Move final backward callback to pre-backward hook of root FSDP instance,
      so that it is always attached to the outer most backward call and fired
      after all backward calls are completed.
      
      Also added flags to check final backward callback is fired when final
      backward callback is required.
      
      If root FSDP is checkpointed and called multiple times in forward,
      check pointer counter is used to make sure final backward callback is queued inside last inner backward
      call as well.
      
      Test Plan: unit tests
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * reformat
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * nits and unit tests
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * address some comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * replace m with self
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * reformat
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * nits
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * remove the fired flag
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * assert state on root only
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      ba7df621
  20. 27 Jul, 2021 1 commit
  21. 26 Jul, 2021 1 commit
    • Min Xu's avatar
      [feat]: prepare FSDP to handle multiple flatten params and fixed metadata saving for MoE (#746) · 83b0b49e
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 3: make FSDP use FlattenParamModule unconditionally
      
      * fixing the auto_wrap tests
      
      * minor
      
      * rewrite local_metadata_dict
      
      - updated FPW so that custom flat param name is also supported
      
      * bug fix
      
      * mypy
      
      * rewrote consolidate_shard_weights
      
      - test_consolidate passes
      
      * comments
      
      * fixing pickling
      
      * Fix shared params and MoE logic (#749)
      
      * add strict kwarg to support fairseq:gshard MoE saving logic
      
      * Test fairseq style shard
      
      * style
      
      * formatting and address comments
      
      * added changelog
      
      * fixing a test after padding renaming
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      83b0b49e
  22. 19 Jul, 2021 1 commit
  23. 07 Jul, 2021 1 commit
  24. 28 Jun, 2021 2 commits
  25. 26 Jun, 2021 1 commit
  26. 25 Jun, 2021 2 commits
  27. 22 Jun, 2021 1 commit
    • Pavel Belevich's avatar
      Update torch to 1.9.0 release (#717) · 1cc4c837
      Pavel Belevich authored
      * Update torch to 1.9.0.dev20210614+cu102
      
      * Update config.yml
      
      * Update config.yml
      
      * Update setup.py
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      1cc4c837
  28. 21 Jun, 2021 1 commit
    • Min Xu's avatar
      [feat] FSDP: supporting multiple flatten parameter groups (#711) · ab71efb3
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 2: extending FPW to support multiple flat params groups
      - FSDP still only use one group
      - unit test does this the new code paths
      - updated the changelog
      
      * first cut, mypy passed
      
      * test_flatten_params_wrapper.py::TestFlattenParams tests pass
      
      * added two more test cases and fixed a case in the code
      
      * fixed one bug with param_path_infos
      
      * fixed two more tests with hardcoded flat_param names
      
      * Update CHANGELOG.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      ab71efb3
  29. 11 Jun, 2021 2 commits
    • anj-s's avatar
      [Offload][feature] Add auto shard functionality to remove requirement of... · cbeda830
      anj-s authored
      [Offload][feature] Add auto shard functionality to remove requirement of nn.Sequential models. (#695)
      
      * auto wrap functionality
      
      * lint and doc strings
      
      * fix lint errors
      
      * lint errors and version skips
      
      * remove mypy checking and add conditional import
      
      * another math.prod instance
      
      * another import fix
      
      * address comments
      
      * lint errors
      
      * address comments
      
      * fix lint errors
      
      * add placeholder nodes to tracker list
      cbeda830
    • Pete's avatar
      Use original forward pass directly when in eval mode from within checkpoint wrapper (#709) · 370b8483
      Pete authored
      * add failing test
      
      * add fix
      
      * use 'torch.is_grad_enabled()' instead of 'module.training'
      
      * Revert "add failing test"
      
      This reverts commit 1c34242208f9b2c5fa6c8f181434c2be6d7cdbc0.
      
      * add simple test
      
      * improve test
      
      * add check for fwd_counter
      
      * revert typing/format changes
      
      * move to new test file
      
      * CHANGELOG
      
      * remove old test
      
      * fix import order
      
      * fix test to be compat with torch 1.6.0
      
      * clean up
      
      * comments
      
      * isort 🤦
      370b8483
  30. 08 Jun, 2021 1 commit
  31. 01 Jun, 2021 1 commit