1. 07 May, 2021 1 commit
    • msbaines's avatar
      [feat] experimental.nn.SyncBatchNorm: initial commit (#662) · f0a40046
      msbaines authored
      * [feat] experimental.nn.SyncBatchNorm: initial commit
      
      Fast/simple re-implementation of SyncBatchNorm.
      
      When profiling SSL Vision, I was seeing a majority of cycles spent in
      SyncBatchNorm. With this change, I see a 10% to 20% speedup on the
      model I was profiling.
      
      When running benchmarks/experimental/sync_batchnorm.py on 8 x V100,
      I get a 6x speedup:
      
      <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
      Elapsed time is  0.08709120750427246
      Elapsed time is  0.12632274627685547
      Elapsed time is  0.14095258712768555
      Elapsed time is  0.16529417037963867
      Elapsed time is  0.1419970989227295
      Elapsed time is  0.15166854858398438
      Elapsed time is  0.12000870704650879
      Elapsed time is  0.17534875869750977
      <class 'torch.nn.modules.batchnorm.SyncBatchNorm'>
      Elapsed time is  2.5087168216705322
      Elapsed time is  2.497001886367798
      Elapsed time is  2.5204885005950928
      Elapsed time is  2.526789903640747
      Elapsed time is  2.5080230236053467
      Elapsed time is  2.524489641189575
      Elapsed time is  2.513214588165283
      Elapsed time is  2.5359973907470703
      <class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'>
      Elapsed time is  0.4126114845275879
      Elapsed time is  0.39051294326782227
      Elapsed time is  0.40685415267944336
      Elapsed time is  0.4159870147705078
      Elapsed time is  0.42383885383605957
      Elapsed time is  0.4080159664154053
      Elapsed time is  0.41202712059020996
      Elapsed time is  0.42400121688842773
      f0a40046
  2. 05 May, 2021 3 commits
    • Min Xu's avatar
      [fix] better assert and better test for frozen weights (#657) · b54eed1b
      Min Xu authored
      
      
      * [fix] better assert and better test for frozen weights
      
      - the precise condition should have been check m.parameters(), not
        m.params.
      - fixes #643
      
      * add changelog
      
      * use enum is so much better
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      b54eed1b
    • Benjamin Lefaudeux's avatar
      [draft][chore] SDP : increase code coverage (#653) · 69cbdf5d
      Benjamin Lefaudeux authored
      * increasing the code coverage, good practice and raising bugs.  hopefully getting to 100%
      * small bugfix
      69cbdf5d
    • Min Xu's avatar
      [fix] add clear_autocast_cache flag (#650) · 861b5ce2
      Min Xu authored
      
      
      * [fix] add clear_autocast_cache flag
      
      - when training in AMP model with weight dtype32, FSDP may need to
        optionally clear the autocast cache to avoid GPU OOM
      - this flag is default false, automatically doing it is a future TODO
      - also added a verbose flag to make print(fsdp_model) a bit shorter
      - updated the memory test to cover those new code
      - added a couple of useful functions in parallel.py and testing.py
      
      * minor
      
      * address comments
      
      * format
      
      * improve the test
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      861b5ce2
  3. 04 May, 2021 1 commit
  4. 03 May, 2021 1 commit
  5. 30 Apr, 2021 1 commit
  6. 29 Apr, 2021 2 commits
  7. 28 Apr, 2021 3 commits
    • Min Xu's avatar
      [test] improve BN test coverage (#638) · 21cba91b
      Min Xu authored
      
      
      * [test] improve BN test coverage
      
      - Added sync_bn on/off cases
      - Added conv and linear bias on/off cases
      - clarified when sync_bn is off, when is BN wrapping needed with the test
      
      * adding a comment
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      21cba91b
    • Mehdi Mirzazadeh's avatar
      adding auto graph generation for distributed pipeline (#615) · bdc0581b
      Mehdi Mirzazadeh authored
      * adding auto graph generation for distributed pipeline
      
      * ignore trace.py for my for now, since it needs pytorch 1.8
      
      * fixing tests
      
      * simplifying graph api
      
      * remove unused debug utilities
      
      * use inspect to find argument lists
      
      * use sharded linear layer
      
      * flkae8
      
      * comment
      
      * polishing
      
      * polishing
      bdc0581b
    • Min Xu's avatar
      [feat] save memory by using bucket buffer only in backward (#633) · a5594032
      Min Xu authored
      
      
      * [feat] save memory by using bucket buffer only in backward
      
      - this fixes bug #627
      - added documentation to clarify the buffer's cost and speed/memory
        tradeoff
      - added setup/teardown calls so that the buffer is only allocated
        during the backward pass, saving more memory for forward and stepping
        so that they can be used for things like activations.
      - added a unit test that assert the memory is in range.
      
      Comparing with DDP:
      
        1. buffer size scales with # of FSDP not model size
        2. buffer is only allocated during backward
        3. buffer is used for small tensors only to reduce overhead
        4. overlapping of compute-reduction is very different
      
      * add PR number to changelog
      
      * filled in with memory number on 1.9
      
      * addressed comments
      
      * update comments
      
      * fix for 1.6
      
      * add a todo
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      a5594032
  8. 26 Apr, 2021 1 commit
  9. 23 Apr, 2021 1 commit
    • shuyingsunshine21's avatar
      [FSDP] relax checking root condition (#620) · d3b86d65
      shuyingsunshine21 authored
      * relax checking root condition
      
      * formatting
      
      * add unittest
      
      * add unittest to ci test list
      
      * isort for import of unittest
      
      * format black .
      
      * move test to list 1
      
      * add skip no cuda
      
      * black and isort
      d3b86d65
  10. 22 Apr, 2021 2 commits
  11. 19 Apr, 2021 1 commit
    • Min Xu's avatar
      FSDP: fixing training with freezing weights (#614) · 24da3b11
      Min Xu authored
      
      
      * FSDP: fixing training with freezing weights
      
      - an assert is changed to catch this case correctly
      - unit test added (based on Quentin's test code) for this case and
        compare DDP and FSDP
      
      fixes: #610
      
      * added test file to list 1
      
      * Use better and simpler code as suggested by Myle
      
      * testing both methods of freezing as well
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      24da3b11
  12. 15 Apr, 2021 1 commit
  13. 13 Apr, 2021 3 commits
  14. 08 Apr, 2021 1 commit
  15. 07 Apr, 2021 2 commits
  16. 06 Apr, 2021 1 commit
  17. 05 Apr, 2021 1 commit
  18. 04 Apr, 2021 3 commits
  19. 02 Apr, 2021 1 commit
  20. 01 Apr, 2021 1 commit
  21. 31 Mar, 2021 4 commits
    • msbaines's avatar
    • anj-s's avatar
      [offload] Audit OffloadModel API, add error messages and remove redundant code path. (#557) · 34384e1b
      anj-s authored
      * renaming/adding error messages
      
      * address comments
      
      * address comments
      
      * add more comments
      
      * add more comments
      34384e1b
    • Min Xu's avatar
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98
      Min Xu authored
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)
      
      * [fix] disable single rank process group for auto_wrap_bn
      
      - beefed up unit test with regnet-like model
      - found that single-rank process group is causing problem
      - disabled it to enable convergence tests on the vissl side
      - use `raise e from None` to get a better assertion output
        in testing.py.
      
      * [test] fix regnet test for ddp+mixed_precision
      
      - need AMP context in FSDP
      - workaround different between ddp & fsdp when bias=True
      - fixed a bug in input data generation that caused different ranks have
        the same data with wrong iteration count.
      - added TODO for need a better loss and grad_scaler and reduced
        iters so there is no nan.
      - added a (disabled) debugging code
      
      * lint
      
      * lint
      
      * add scaler
      
      * lint
      
      * scaler
      
      * add a real loss
      
      * seeding in the ranks
      
      * blance tests
      
      * run AMP DDP==FSDP test only on cuda version 11 and up
      
      * add relu inplace and comment
      
      * make wrap_bn covers more cases in full precision mode
      a0458b98
    • msbaines's avatar
      acb9ef00
  22. 30 Mar, 2021 1 commit
  23. 29 Mar, 2021 1 commit
  24. 28 Mar, 2021 1 commit
  25. 26 Mar, 2021 1 commit
  26. 25 Mar, 2021 1 commit