1. 09 Nov, 2021 1 commit
  2. 08 Nov, 2021 2 commits
  3. 05 Nov, 2021 1 commit
    • Min Xu's avatar
      [feat] experimental MEVO layer (#840) · 8347c1a2
      Min Xu authored
      
      
      * [feat] MEVO kernel
      
      - initial import from min/softmax and min/testing branches
      - need to rename and further cleanup
      
      * only test with newer pytorch
      
      * renamed and added comments and code cleanup
      
      * rename and reduce test memory
      
      * testing
      
      * minor fixing
      
      * fixing
      
      * more fix
      
      * changelog
      
      * more 1.7 and 1.8 paper cuts
      
      * remove dead code
      
      * addressed Benjamin's comments
      
      * addressed more comments
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      8347c1a2
  4. 01 Nov, 2021 1 commit
    • Min Xu's avatar
      [feat] [FSDP]: add experimental support to shared weights (#836) · f2af4c66
      Min Xu authored
      
      
      * added a new test, passing without shared weights
      
      * tested weight sharing
      
      * added the test to test list file
      
      * extended to world_size = 2
      
      * fixed test
      
      * [feat]: add limited and experimental support for shared parameter
      
      * fixed tests
      
      * simplify to work with layer with at least 1 non-shared params and add code to pick up linked_param field for sharding the shared param
      
      * fixed the case where linked param is not in separate FSDP
      
      * changelog and remove old code
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      f2af4c66
  5. 28 Oct, 2021 1 commit
  6. 27 Oct, 2021 3 commits
  7. 13 Sep, 2021 1 commit
  8. 12 Sep, 2021 1 commit
    • Darryl Barnhart's avatar
      [fix] FSDP intra-backwards gradient accumulation. (#784) · 4fa2ab9b
      Darryl Barnhart authored
      * [fix] FSDP intra-backwards gradient accumulation.
      
      Ensure gradient reduction accumulates into the unsharded gradient tensor
      within a backwards pass. This matters when an FSDP module is called
      multiple times within a forward pass, and reduction is _not_ deferred
      using activation checkpoint forward counters, bucketing or some other
      mechanism.
      
      Closes #780
      
      * [refactor] Remove forward counters. Comments.
      
      Removed forward counters from the activation checkpointing utility, now
      that FSDP does not require them for correct operation. Add more detailed
      comment about memory usage behaviour with gradient reduction.
      
      * [refactor] Delete deprecated forward counter usage.
      
      * [refactor] Add state assertion as end of pre-backward hook.
      4fa2ab9b
  9. 11 Sep, 2021 1 commit
    • Alex Xiao's avatar
      [feat] set requires_grad of output tensors of checkpointed modules properly (#787) · 482944d9
      Alex Xiao authored
      
      
      Before this commit, output tensors of checkpointed modules always
      require grad, even if they shouldn't. This commit makes it so that
      the outputs of checkpointed modules only require grad if either
      the input requires grad or if the parameters require grad.
      
      To achieve this, this commit also adds a new _unflattened_param_views
      attribute to modules being flattened. This allows the checkpointing
      to still access the parameters and check if gradients need to be
      computed.
      Co-authored-by: default avatarAlex Xiao <axiao@fb.com>
      482944d9
  10. 07 Sep, 2021 1 commit
  11. 06 Sep, 2021 1 commit
    • Min Xu's avatar
      [cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup;... · 3ecf76f4
      Min Xu authored
      
      [cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744)
      
      * changelog; mypy; oss cleanup
      
      * more broadcast_object cleanup in FSDP
      
      * one more mypy fix
      
      * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release
      
      * update torch version for LTS
      
      * minor fixes
      
      * update cache key
      
      * trying newer gpu VMs
      
      * bump the cache
      
      * update to gpu.medium, which should be 2 GPUs
      
      * update nightly version
      
      * add pre-commit instruction
      
      * fixed CHANGELOG after merging
      
      * updated to newer nightly
      
      * retained the older broadcast function for older GPUs for oss.py
      
      * fixed a bug
      
      * added a comment
      
      * fixing a test for pytorch 1.10
      
      * testing a fix
      
      * Update fairscale/optim/oss.py
      
      * Update CONTRIBUTING.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      3ecf76f4
  12. 12 Aug, 2021 2 commits
  13. 31 Jul, 2021 1 commit
  14. 30 Jul, 2021 1 commit
    • Yanli Zhao's avatar
      [FSDP] Move final backward callback queueing to pre-backward hook of root instance (#753) · ba7df621
      Yanli Zhao authored
      Move final backward callback to pre-backward hook of root FSDP instance
      
      Summary:
      
      Move final backward callback to pre-backward hook of root FSDP instance,
      so that it is always attached to the outer most backward call and fired
      after all backward calls are completed.
      
      Also added flags to check final backward callback is fired when final
      backward callback is required.
      
      If root FSDP is checkpointed and called multiple times in forward,
      check pointer counter is used to make sure final backward callback is queued inside last inner backward
      call as well.
      
      Test Plan: unit tests
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * reformat
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * nits and unit tests
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * address some comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * replace m with self
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * reformat
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * nits
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * remove the fired flag
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * assert state on root only
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * comments
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      ba7df621
  15. 26 Jul, 2021 1 commit
    • Min Xu's avatar
      [feat]: prepare FSDP to handle multiple flatten params and fixed metadata saving for MoE (#746) · 83b0b49e
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 3: make FSDP use FlattenParamModule unconditionally
      
      * fixing the auto_wrap tests
      
      * minor
      
      * rewrite local_metadata_dict
      
      - updated FPW so that custom flat param name is also supported
      
      * bug fix
      
      * mypy
      
      * rewrote consolidate_shard_weights
      
      - test_consolidate passes
      
      * comments
      
      * fixing pickling
      
      * Fix shared params and MoE logic (#749)
      
      * add strict kwarg to support fairseq:gshard MoE saving logic
      
      * Test fairseq style shard
      
      * style
      
      * formatting and address comments
      
      * added changelog
      
      * fixing a test after padding renaming
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      83b0b49e
  16. 19 Jul, 2021 1 commit
  17. 07 Jul, 2021 1 commit
  18. 28 Jun, 2021 1 commit
    • Yanli Zhao's avatar
      Make sure requires_grad of FlatParameter to be consistent with requires_grad... · 91c7dd05
      Yanli Zhao authored
      Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters (#721)
      
      * Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters
      
      * Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters
      91c7dd05
  19. 26 Jun, 2021 1 commit
  20. 21 Jun, 2021 1 commit
    • Min Xu's avatar
      [feat] FSDP: supporting multiple flatten parameter groups (#711) · ab71efb3
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 2: extending FPW to support multiple flat params groups
      - FSDP still only use one group
      - unit test does this the new code paths
      - updated the changelog
      
      * first cut, mypy passed
      
      * test_flatten_params_wrapper.py::TestFlattenParams tests pass
      
      * added two more test cases and fixed a case in the code
      
      * fixed one bug with param_path_infos
      
      * fixed two more tests with hardcoded flat_param names
      
      * Update CHANGELOG.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      ab71efb3
  21. 11 Jun, 2021 1 commit
    • Pete's avatar
      Use original forward pass directly when in eval mode from within checkpoint wrapper (#709) · 370b8483
      Pete authored
      * add failing test
      
      * add fix
      
      * use 'torch.is_grad_enabled()' instead of 'module.training'
      
      * Revert "add failing test"
      
      This reverts commit 1c34242208f9b2c5fa6c8f181434c2be6d7cdbc0.
      
      * add simple test
      
      * improve test
      
      * add check for fwd_counter
      
      * revert typing/format changes
      
      * move to new test file
      
      * CHANGELOG
      
      * remove old test
      
      * fix import order
      
      * fix test to be compat with torch 1.6.0
      
      * clean up
      
      * comments
      
      * isort 🤦
      370b8483
  22. 08 Jun, 2021 1 commit
  23. 01 Jun, 2021 1 commit
  24. 28 May, 2021 1 commit
  25. 17 May, 2021 2 commits
    • Min Xu's avatar
      [fix] auto_wrap: support wrapping based on wrapper_config (#685) · 9d2bbcf2
      Min Xu authored
      
      
      * [fix] auto_wrap: support wrapping based on wrapper_config
      
      - user can use this to avoid assert if auto_wrap is used multiple times on a module
      - user can traverse the modules multiple times and assign a wrapper_config
        to the module and then use auto_wrap once to wrap them
      
      fix #649
      fix #585
      
      * added changelog
      
      * fix tests
      
      * fix a test
      
      * added an optional assert for collision based on discussions with Quentin
      
      * added config_auto_wrap_policy
      
      * lint
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      9d2bbcf2
    • Quentin Duval's avatar
      [feat] Save FSDP metadata for offline unflattening + Consolidate checkpoints (#683) · 81c20f72
      Quentin Duval authored
      
      
      * Save FSDP metadata for offline unflattening
      
      * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint
      
      * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint
      
      * Add a unit test to show how to use the function
      
      * Code review + improvement of the unit tests
      
      * Code review: extract clean_path
      
      * Make meta data and consolidation of checkpoint work for flatten_parameter=False
      
      * Add new unit test file in CI
      
      * Complete changelog and fix mypy issues
      
      * Add support for module buffers in the consolidation of sharded checkpoints
      
      * Better support for module buffers: save them in the meta data
      
      * Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format)
      
      * Renaming to make code clearer
      
      * Code review: in_temporary_directory rework and typo correction
      
      * Renaming
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      Co-authored-by: default avatarQuentinDuval <QuentinDuval@users.noreply.github.com>
      81c20f72
  26. 14 May, 2021 1 commit
  27. 13 May, 2021 1 commit
    • Min Xu's avatar
      [fix] add and use get_process_group_cached (#678) · bde4bac5
      Min Xu authored
      * [fix] add and use get_process_group_cached
      
      - This commit makes FSDP avoid making too many process groups by default
      - Extra process group is bad for GPU memory and init time
      
      * add changelog
      
      * lint
      
      * note on speed
      
      * add better assert output
      
      test seems to be flaky:
      https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps
      
      
      
      * update test reference memory values
      
      - With cached process groups, the memory is reduced as reported by
      pytorch as well (due to bucket buffer memory for the reduction buffer)
      - The effect on memory is actually more on the SMI memory, which is not
      reported by pytorch and checked by this test.
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update CHANGELOG.md
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * improved changelog
      
      * better handling of underscores in the md file
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      bde4bac5
  28. 12 May, 2021 1 commit
    • anj-s's avatar
      [chore] Rename and move checkpoint_activations from misc folder. (#654) · 72c6bab2
      anj-s authored
      * rename files
      
      * add newly renamed file
      
      * rename and move checkpoint activations related files
      
      * add test files to ci list
      
      * fix lint errors
      
      * modify docs
      
      * add changelog
      
      * retain old path for now
      
      * fix lint errors
      
      * add another import test case
      
      * fix merge conflict
      
      * add missing test file
      72c6bab2
  29. 11 May, 2021 1 commit
    • Min Xu's avatar
      [fix] FSDP forward pass overlap between compute and all-gather (#671) · 8a42a8e3
      Min Xu authored
      
      
      * [fix] FSDP forward pass overlap between compute and all-gather
      
      - much thanks for @cyanguwa for report and @QuentinDuval for debugging it
      - a new unit test is added to check for this and ensure we detect
        issue with overlapping and cpu/gpu blocking wait calls
      
      * fix
      
      * fix
      
      * fix
      
      * better assertion outputs
      
      * fix format and tune all_gather mb for CI
      
      * more tuning with non_flatten
      
      * undo an accidental change
      
      * tuning all gather mb and del model
      
      * Update + fix overlapping test to use patched all_gather w/ delay (#672)
      
      * fixing get_cycles_per_ms
      
      * add get_smi_memory
      
      * update the docstring
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      Co-authored-by: default avatarMyle Ott <myleott@fb.com>
      8a42a8e3
  30. 08 May, 2021 2 commits
  31. 07 May, 2021 1 commit
  32. 05 May, 2021 3 commits
    • Min Xu's avatar
      [fix] better assert and better test for frozen weights (#657) · b54eed1b
      Min Xu authored
      
      
      * [fix] better assert and better test for frozen weights
      
      - the precise condition should have been check m.parameters(), not
        m.params.
      - fixes #643
      
      * add changelog
      
      * use enum is so much better
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      b54eed1b
    • Benjamin Lefaudeux's avatar
      [draft][chore] SDP : increase code coverage (#653) · 69cbdf5d
      Benjamin Lefaudeux authored
      * increasing the code coverage, good practice and raising bugs.  hopefully getting to 100%
      * small bugfix
      69cbdf5d
    • Min Xu's avatar
      [fix] add clear_autocast_cache flag (#650) · 861b5ce2
      Min Xu authored
      
      
      * [fix] add clear_autocast_cache flag
      
      - when training in AMP model with weight dtype32, FSDP may need to
        optionally clear the autocast cache to avoid GPU OOM
      - this flag is default false, automatically doing it is a future TODO
      - also added a verbose flag to make print(fsdp_model) a bit shorter
      - updated the memory test to cover those new code
      - added a couple of useful functions in parallel.py and testing.py
      
      * minor
      
      * address comments
      
      * format
      
      * improve the test
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      861b5ce2