1. 28 Apr, 2021 1 commit
    • Min Xu's avatar
      [feat] save memory by using bucket buffer only in backward (#633) · a5594032
      Min Xu authored
      
      
      * [feat] save memory by using bucket buffer only in backward
      
      - this fixes bug #627
      - added documentation to clarify the buffer's cost and speed/memory
        tradeoff
      - added setup/teardown calls so that the buffer is only allocated
        during the backward pass, saving more memory for forward and stepping
        so that they can be used for things like activations.
      - added a unit test that assert the memory is in range.
      
      Comparing with DDP:
      
        1. buffer size scales with # of FSDP not model size
        2. buffer is only allocated during backward
        3. buffer is used for small tensors only to reduce overhead
        4. overlapping of compute-reduction is very different
      
      * add PR number to changelog
      
      * filled in with memory number on 1.9
      
      * addressed comments
      
      * update comments
      
      * fix for 1.6
      
      * add a todo
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      a5594032
  2. 26 Apr, 2021 1 commit
  3. 23 Apr, 2021 1 commit
    • shuyingsunshine21's avatar
      [FSDP] relax checking root condition (#620) · d3b86d65
      shuyingsunshine21 authored
      * relax checking root condition
      
      * formatting
      
      * add unittest
      
      * add unittest to ci test list
      
      * isort for import of unittest
      
      * format black .
      
      * move test to list 1
      
      * add skip no cuda
      
      * black and isort
      d3b86d65
  4. 22 Apr, 2021 3 commits
  5. 21 Apr, 2021 1 commit
  6. 20 Apr, 2021 1 commit
  7. 19 Apr, 2021 1 commit
    • Min Xu's avatar
      FSDP: fixing training with freezing weights (#614) · 24da3b11
      Min Xu authored
      
      
      * FSDP: fixing training with freezing weights
      
      - an assert is changed to catch this case correctly
      - unit test added (based on Quentin's test code) for this case and
        compare DDP and FSDP
      
      fixes: #610
      
      * added test file to list 1
      
      * Use better and simpler code as suggested by Myle
      
      * testing both methods of freezing as well
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      24da3b11
  8. 14 Apr, 2021 1 commit
  9. 13 Apr, 2021 1 commit
  10. 08 Apr, 2021 1 commit
  11. 07 Apr, 2021 1 commit
  12. 04 Apr, 2021 1 commit
  13. 03 Apr, 2021 1 commit
  14. 31 Mar, 2021 1 commit
    • Min Xu's avatar
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98
      Min Xu authored
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)
      
      * [fix] disable single rank process group for auto_wrap_bn
      
      - beefed up unit test with regnet-like model
      - found that single-rank process group is causing problem
      - disabled it to enable convergence tests on the vissl side
      - use `raise e from None` to get a better assertion output
        in testing.py.
      
      * [test] fix regnet test for ddp+mixed_precision
      
      - need AMP context in FSDP
      - workaround different between ddp & fsdp when bias=True
      - fixed a bug in input data generation that caused different ranks have
        the same data with wrong iteration count.
      - added TODO for need a better loss and grad_scaler and reduced
        iters so there is no nan.
      - added a (disabled) debugging code
      
      * lint
      
      * lint
      
      * add scaler
      
      * lint
      
      * scaler
      
      * add a real loss
      
      * seeding in the ranks
      
      * blance tests
      
      * run AMP DDP==FSDP test only on cuda version 11 and up
      
      * add relu inplace and comment
      
      * make wrap_bn covers more cases in full precision mode
      a0458b98
  15. 25 Mar, 2021 1 commit
  16. 20 Mar, 2021 1 commit
  17. 18 Mar, 2021 3 commits
  18. 17 Mar, 2021 1 commit
  19. 12 Mar, 2021 1 commit
  20. 09 Mar, 2021 3 commits
  21. 08 Mar, 2021 3 commits
  22. 06 Mar, 2021 1 commit
  23. 04 Mar, 2021 1 commit
  24. 02 Mar, 2021 2 commits
    • Myle Ott's avatar
      d2924670
    • Sean Naren's avatar
      [feat] Add context manager to FSDP for easier child module wrapping (#446) · f3359550
      Sean Naren authored
      This adds a context manager that assists in making child modules with similar defaults.
      Usage:
      ```
      from fairscale.nn.misc import enable_wrap, wrap
      
      with enable_wrap(**handleful_of_important_params):
          layer_1 = wrap(torch.nn.Linear(5, 5))
          layer_2 = wrap(torch.nn.Linear(5, 5), flatten_parameters=True) # Override parameters if you'd like
      
      # without the context manager, creates Linear layer
      layer_1 = wrap(torch.nn.Linear(5, 5))
      ```
      If not within the FSDP context, this would be a no-op. This makes it easier to annotate layers without having to copy any changes in parameters.
      f3359550
  25. 01 Mar, 2021 1 commit
  26. 27 Feb, 2021 1 commit
  27. 26 Feb, 2021 2 commits
  28. 25 Feb, 2021 1 commit
  29. 24 Feb, 2021 1 commit
  30. 23 Feb, 2021 1 commit