1. 12 Jul, 2021 2 commits
  2. 07 Jul, 2021 1 commit
  3. 28 Jun, 2021 4 commits
  4. 26 Jun, 2021 2 commits
  5. 25 Jun, 2021 3 commits
  6. 23 Jun, 2021 1 commit
  7. 22 Jun, 2021 1 commit
    • Pavel Belevich's avatar
      Update torch to 1.9.0 release (#717) · 1cc4c837
      Pavel Belevich authored
      * Update torch to 1.9.0.dev20210614+cu102
      
      * Update config.yml
      
      * Update config.yml
      
      * Update setup.py
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      1cc4c837
  8. 21 Jun, 2021 1 commit
    • Min Xu's avatar
      [feat] FSDP: supporting multiple flatten parameter groups (#711) · ab71efb3
      Min Xu authored
      
      
      * [feat] FSDP: supporting multiple flatten parameter groups
      
      - step 2: extending FPW to support multiple flat params groups
      - FSDP still only use one group
      - unit test does this the new code paths
      - updated the changelog
      
      * first cut, mypy passed
      
      * test_flatten_params_wrapper.py::TestFlattenParams tests pass
      
      * added two more test cases and fixed a case in the code
      
      * fixed one bug with param_path_infos
      
      * fixed two more tests with hardcoded flat_param names
      
      * Update CHANGELOG.md
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      ab71efb3
  9. 14 Jun, 2021 1 commit
  10. 11 Jun, 2021 3 commits
    • anj-s's avatar
      [Offload][feature] Add auto shard functionality to remove requirement of... · cbeda830
      anj-s authored
      [Offload][feature] Add auto shard functionality to remove requirement of nn.Sequential models. (#695)
      
      * auto wrap functionality
      
      * lint and doc strings
      
      * fix lint errors
      
      * lint errors and version skips
      
      * remove mypy checking and add conditional import
      
      * another math.prod instance
      
      * another import fix
      
      * address comments
      
      * lint errors
      
      * address comments
      
      * fix lint errors
      
      * add placeholder nodes to tracker list
      cbeda830
    • anj-s's avatar
      remove examples dir (#712) · 7bdb9a7f
      anj-s authored
      7bdb9a7f
    • Pete's avatar
      Use original forward pass directly when in eval mode from within checkpoint wrapper (#709) · 370b8483
      Pete authored
      * add failing test
      
      * add fix
      
      * use 'torch.is_grad_enabled()' instead of 'module.training'
      
      * Revert "add failing test"
      
      This reverts commit 1c34242208f9b2c5fa6c8f181434c2be6d7cdbc0.
      
      * add simple test
      
      * improve test
      
      * add check for fwd_counter
      
      * revert typing/format changes
      
      * move to new test file
      
      * CHANGELOG
      
      * remove old test
      
      * fix import order
      
      * fix test to be compat with torch 1.6.0
      
      * clean up
      
      * comments
      
      * isort 🤦
      370b8483
  11. 08 Jun, 2021 1 commit
  12. 01 Jun, 2021 3 commits
  13. 28 May, 2021 2 commits
  14. 27 May, 2021 3 commits
  15. 26 May, 2021 2 commits
  16. 21 May, 2021 1 commit
    • Nicholas Cilfone's avatar
      [refactor] ShardedGradScaler init and super call (#691) · 945b9666
      Nicholas Cilfone authored
      Make ShardedGradScaler __init__ mirror GradScaler so super can forward parameters. Without this one cannot configure a ShardedGradScaler object like one can with the PyTorch native GradScaler object.
      Updated with black linter.
      Added stub for GradScaler __init__ which solves mypy issues and removed
      ignore comment.
      945b9666
  17. 18 May, 2021 2 commits
  18. 17 May, 2021 2 commits
    • Min Xu's avatar
      [fix] auto_wrap: support wrapping based on wrapper_config (#685) · 9d2bbcf2
      Min Xu authored
      
      
      * [fix] auto_wrap: support wrapping based on wrapper_config
      
      - user can use this to avoid assert if auto_wrap is used multiple times on a module
      - user can traverse the modules multiple times and assign a wrapper_config
        to the module and then use auto_wrap once to wrap them
      
      fix #649
      fix #585
      
      * added changelog
      
      * fix tests
      
      * fix a test
      
      * added an optional assert for collision based on discussions with Quentin
      
      * added config_auto_wrap_policy
      
      * lint
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      9d2bbcf2
    • Quentin Duval's avatar
      [feat] Save FSDP metadata for offline unflattening + Consolidate checkpoints (#683) · 81c20f72
      Quentin Duval authored
      
      
      * Save FSDP metadata for offline unflattening
      
      * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint
      
      * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint
      
      * Add a unit test to show how to use the function
      
      * Code review + improvement of the unit tests
      
      * Code review: extract clean_path
      
      * Make meta data and consolidation of checkpoint work for flatten_parameter=False
      
      * Add new unit test file in CI
      
      * Complete changelog and fix mypy issues
      
      * Add support for module buffers in the consolidation of sharded checkpoints
      
      * Better support for module buffers: save them in the meta data
      
      * Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format)
      
      * Renaming to make code clearer
      
      * Code review: in_temporary_directory rework and typo correction
      
      * Renaming
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      Co-authored-by: default avatarQuentinDuval <QuentinDuval@users.noreply.github.com>
      81c20f72
  19. 14 May, 2021 4 commits
  20. 13 May, 2021 1 commit
    • Min Xu's avatar
      [fix] add and use get_process_group_cached (#678) · bde4bac5
      Min Xu authored
      * [fix] add and use get_process_group_cached
      
      - This commit makes FSDP avoid making too many process groups by default
      - Extra process group is bad for GPU memory and init time
      
      * add changelog
      
      * lint
      
      * note on speed
      
      * add better assert output
      
      test seems to be flaky:
      https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps
      
      
      
      * update test reference memory values
      
      - With cached process groups, the memory is reduced as reported by
      pytorch as well (due to bucket buffer memory for the reduction buffer)
      - The effect on memory is actually more on the SMI memory, which is not
      reported by pytorch and checked by this test.
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update CHANGELOG.md
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * improved changelog
      
      * better handling of underscores in the md file
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      bde4bac5