1. 01 Jun, 2021 2 commits
  2. 28 May, 2021 2 commits
  3. 27 May, 2021 3 commits
  4. 26 May, 2021 2 commits
  5. 21 May, 2021 1 commit
    • Nicholas Cilfone's avatar
      [refactor] ShardedGradScaler init and super call (#691) · 945b9666
      Nicholas Cilfone authored
      Make ShardedGradScaler __init__ mirror GradScaler so super can forward parameters. Without this one cannot configure a ShardedGradScaler object like one can with the PyTorch native GradScaler object.
      Updated with black linter.
      Added stub for GradScaler __init__ which solves mypy issues and removed
      ignore comment.
      945b9666
  6. 18 May, 2021 2 commits
  7. 17 May, 2021 2 commits
    • Min Xu's avatar
      [fix] auto_wrap: support wrapping based on wrapper_config (#685) · 9d2bbcf2
      Min Xu authored
      
      
      * [fix] auto_wrap: support wrapping based on wrapper_config
      
      - user can use this to avoid assert if auto_wrap is used multiple times on a module
      - user can traverse the modules multiple times and assign a wrapper_config
        to the module and then use auto_wrap once to wrap them
      
      fix #649
      fix #585
      
      * added changelog
      
      * fix tests
      
      * fix a test
      
      * added an optional assert for collision based on discussions with Quentin
      
      * added config_auto_wrap_policy
      
      * lint
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      9d2bbcf2
    • Quentin Duval's avatar
      [feat] Save FSDP metadata for offline unflattening + Consolidate checkpoints (#683) · 81c20f72
      Quentin Duval authored
      
      
      * Save FSDP metadata for offline unflattening
      
      * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint
      
      * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint
      
      * Add a unit test to show how to use the function
      
      * Code review + improvement of the unit tests
      
      * Code review: extract clean_path
      
      * Make meta data and consolidation of checkpoint work for flatten_parameter=False
      
      * Add new unit test file in CI
      
      * Complete changelog and fix mypy issues
      
      * Add support for module buffers in the consolidation of sharded checkpoints
      
      * Better support for module buffers: save them in the meta data
      
      * Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format)
      
      * Renaming to make code clearer
      
      * Code review: in_temporary_directory rework and typo correction
      
      * Renaming
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      Co-authored-by: default avatarQuentinDuval <QuentinDuval@users.noreply.github.com>
      81c20f72
  8. 14 May, 2021 4 commits
  9. 13 May, 2021 1 commit
    • Min Xu's avatar
      [fix] add and use get_process_group_cached (#678) · bde4bac5
      Min Xu authored
      * [fix] add and use get_process_group_cached
      
      - This commit makes FSDP avoid making too many process groups by default
      - Extra process group is bad for GPU memory and init time
      
      * add changelog
      
      * lint
      
      * note on speed
      
      * add better assert output
      
      test seems to be flaky:
      https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps
      
      
      
      * update test reference memory values
      
      - With cached process groups, the memory is reduced as reported by
      pytorch as well (due to bucket buffer memory for the reduction buffer)
      - The effect on memory is actually more on the SMI memory, which is not
      reported by pytorch and checked by this test.
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      
      * Update CHANGELOG.md
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * Update fairscale/utils/parallel.py
      
      * improved changelog
      
      * better handling of underscores in the md file
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      bde4bac5
  10. 12 May, 2021 2 commits
  11. 11 May, 2021 1 commit
    • Min Xu's avatar
      [fix] FSDP forward pass overlap between compute and all-gather (#671) · 8a42a8e3
      Min Xu authored
      
      
      * [fix] FSDP forward pass overlap between compute and all-gather
      
      - much thanks for @cyanguwa for report and @QuentinDuval for debugging it
      - a new unit test is added to check for this and ensure we detect
        issue with overlapping and cpu/gpu blocking wait calls
      
      * fix
      
      * fix
      
      * fix
      
      * better assertion outputs
      
      * fix format and tune all_gather mb for CI
      
      * more tuning with non_flatten
      
      * undo an accidental change
      
      * tuning all gather mb and del model
      
      * Update + fix overlapping test to use patched all_gather w/ delay (#672)
      
      * fixing get_cycles_per_ms
      
      * add get_smi_memory
      
      * update the docstring
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      Co-authored-by: default avatarMyle Ott <myleott@fb.com>
      8a42a8e3
  12. 10 May, 2021 2 commits
    • Min Xu's avatar
      [chore] Updating PR template (#674) · c8d32c30
      Min Xu authored
      * [chore] Updating PR template
      
      Add N/A (Not Applicable) options to some of the questions in the PR template
      
      * Update PULL_REQUEST_TEMPLATE.md
      
      * Update PULL_REQUEST_TEMPLATE.md
      c8d32c30
    • Min Xu's avatar
      [minor] clarify a comment (#673) · 6c61887d
      Min Xu authored
      - we do have a use case of empty params inside a FSDP -- for the
      overlapping fsdp unit test, we use it to measure timing of compute
      when no params is needed for all_gather
      - therefore, I updated the comment to be more correct there.
      - fixes #661
      6c61887d
  13. 08 May, 2021 5 commits
  14. 07 May, 2021 3 commits
    • msbaines's avatar
      [perf] nn.moe: workaround inefficiency in PyTorch's one_hot (#666) · 99b30a04
      msbaines authored
      Workaround for https://github.com/pytorch/pytorch/issues/55579
      
      Co-authored-by: @shruti-bh, @myleott
      99b30a04
    • Min Xu's avatar
      [fix]: support pytorch SyncBatchNorm under AMP & checkpointing with FSDP (#659) · 6db68518
      Min Xu authored
      
      
      * [test]: add a more general test case
      
      - also rebalance the tests a bit
      
      * added missing arg
      
      * balance
      
      * better checking
      
      * balance
      
      * make test smaller and faster
      
      * make ddp results cached and enable sync_bn
      
      * clean up
      
      * fix tests
      
      * changelog
      
      * blance
      
      * fix
      
      * addressing comments
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      6db68518
    • msbaines's avatar
      [feat] experimental.nn.SyncBatchNorm: initial commit (#662) · f0a40046
      msbaines authored
      * [feat] experimental.nn.SyncBatchNorm: initial commit
      
      Fast/simple re-implementation of SyncBatchNorm.
      
      When profiling SSL Vision, I was seeing a majority of cycles spent in
      SyncBatchNorm. With this change, I see a 10% to 20% speedup on the
      model I was profiling.
      
      When running benchmarks/experimental/sync_batchnorm.py on 8 x V100,
      I get a 6x speedup:
      
      <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
      Elapsed time is  0.08709120750427246
      Elapsed time is  0.12632274627685547
      Elapsed time is  0.14095258712768555
      Elapsed time is  0.16529417037963867
      Elapsed time is  0.1419970989227295
      Elapsed time is  0.15166854858398438
      Elapsed time is  0.12000870704650879
      Elapsed time is  0.17534875869750977
      <class 'torch.nn.modules.batchnorm.SyncBatchNorm'>
      Elapsed time is  2.5087168216705322
      Elapsed time is  2.497001886367798
      Elapsed time is  2.5204885005950928
      Elapsed time is  2.526789903640747
      Elapsed time is  2.5080230236053467
      Elapsed time is  2.524489641189575
      Elapsed time is  2.513214588165283
      Elapsed time is  2.5359973907470703
      <class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'>
      Elapsed time is  0.4126114845275879
      Elapsed time is  0.39051294326782227
      Elapsed time is  0.40685415267944336
      Elapsed time is  0.4159870147705078
      Elapsed time is  0.42383885383605957
      Elapsed time is  0.4080159664154053
      Elapsed time is  0.41202712059020996
      Elapsed time is  0.42400121688842773
      f0a40046
  15. 05 May, 2021 6 commits
  16. 04 May, 2021 1 commit
  17. 03 May, 2021 1 commit