1. 28 Apr, 2021 4 commits
    • Min Xu's avatar
      [test] improve BN test coverage (#638) · 21cba91b
      Min Xu authored
      
      
      * [test] improve BN test coverage
      
      - Added sync_bn on/off cases
      - Added conv and linear bias on/off cases
      - clarified when sync_bn is off, when is BN wrapping needed with the test
      
      * adding a comment
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      21cba91b
    • Mehdi Mirzazadeh's avatar
      adding auto graph generation for distributed pipeline (#615) · bdc0581b
      Mehdi Mirzazadeh authored
      * adding auto graph generation for distributed pipeline
      
      * ignore trace.py for my for now, since it needs pytorch 1.8
      
      * fixing tests
      
      * simplifying graph api
      
      * remove unused debug utilities
      
      * use inspect to find argument lists
      
      * use sharded linear layer
      
      * flkae8
      
      * comment
      
      * polishing
      
      * polishing
      bdc0581b
    • msbaines's avatar
      2bb2a134
    • Min Xu's avatar
      [feat] save memory by using bucket buffer only in backward (#633) · a5594032
      Min Xu authored
      
      
      * [feat] save memory by using bucket buffer only in backward
      
      - this fixes bug #627
      - added documentation to clarify the buffer's cost and speed/memory
        tradeoff
      - added setup/teardown calls so that the buffer is only allocated
        during the backward pass, saving more memory for forward and stepping
        so that they can be used for things like activations.
      - added a unit test that assert the memory is in range.
      
      Comparing with DDP:
      
        1. buffer size scales with # of FSDP not model size
        2. buffer is only allocated during backward
        3. buffer is used for small tensors only to reduce overhead
        4. overlapping of compute-reduction is very different
      
      * add PR number to changelog
      
      * filled in with memory number on 1.9
      
      * addressed comments
      
      * update comments
      
      * fix for 1.6
      
      * add a todo
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      a5594032
  2. 27 Apr, 2021 1 commit
  3. 26 Apr, 2021 4 commits
  4. 23 Apr, 2021 2 commits
  5. 22 Apr, 2021 3 commits
  6. 21 Apr, 2021 2 commits
  7. 20 Apr, 2021 1 commit
  8. 19 Apr, 2021 2 commits
  9. 15 Apr, 2021 3 commits
  10. 14 Apr, 2021 1 commit
  11. 13 Apr, 2021 4 commits
  12. 09 Apr, 2021 1 commit
  13. 08 Apr, 2021 1 commit
  14. 07 Apr, 2021 3 commits
  15. 06 Apr, 2021 1 commit
  16. 05 Apr, 2021 3 commits
  17. 04 Apr, 2021 3 commits
  18. 03 Apr, 2021 1 commit