1. 05 May, 2021 2 commits
    • anj-s's avatar
      add info about PEP8 style guide (#651) · 0ce85af2
      anj-s authored
      
      Co-authored-by: default avatarAnjali Sridhar <anj@devfair0443.h2.fair>
      0ce85af2
    • Min Xu's avatar
      [fix] add clear_autocast_cache flag (#650) · 861b5ce2
      Min Xu authored
      
      
      * [fix] add clear_autocast_cache flag
      
      - when training in AMP model with weight dtype32, FSDP may need to
        optionally clear the autocast cache to avoid GPU OOM
      - this flag is default false, automatically doing it is a future TODO
      - also added a verbose flag to make print(fsdp_model) a bit shorter
      - updated the memory test to cover those new code
      - added a couple of useful functions in parallel.py and testing.py
      
      * minor
      
      * address comments
      
      * format
      
      * improve the test
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      861b5ce2
  2. 04 May, 2021 1 commit
  3. 03 May, 2021 2 commits
  4. 30 Apr, 2021 1 commit
  5. 29 Apr, 2021 2 commits
  6. 28 Apr, 2021 4 commits
    • Min Xu's avatar
      [test] improve BN test coverage (#638) · 21cba91b
      Min Xu authored
      
      
      * [test] improve BN test coverage
      
      - Added sync_bn on/off cases
      - Added conv and linear bias on/off cases
      - clarified when sync_bn is off, when is BN wrapping needed with the test
      
      * adding a comment
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      21cba91b
    • Mehdi Mirzazadeh's avatar
      adding auto graph generation for distributed pipeline (#615) · bdc0581b
      Mehdi Mirzazadeh authored
      * adding auto graph generation for distributed pipeline
      
      * ignore trace.py for my for now, since it needs pytorch 1.8
      
      * fixing tests
      
      * simplifying graph api
      
      * remove unused debug utilities
      
      * use inspect to find argument lists
      
      * use sharded linear layer
      
      * flkae8
      
      * comment
      
      * polishing
      
      * polishing
      bdc0581b
    • msbaines's avatar
      2bb2a134
    • Min Xu's avatar
      [feat] save memory by using bucket buffer only in backward (#633) · a5594032
      Min Xu authored
      
      
      * [feat] save memory by using bucket buffer only in backward
      
      - this fixes bug #627
      - added documentation to clarify the buffer's cost and speed/memory
        tradeoff
      - added setup/teardown calls so that the buffer is only allocated
        during the backward pass, saving more memory for forward and stepping
        so that they can be used for things like activations.
      - added a unit test that assert the memory is in range.
      
      Comparing with DDP:
      
        1. buffer size scales with # of FSDP not model size
        2. buffer is only allocated during backward
        3. buffer is used for small tensors only to reduce overhead
        4. overlapping of compute-reduction is very different
      
      * add PR number to changelog
      
      * filled in with memory number on 1.9
      
      * addressed comments
      
      * update comments
      
      * fix for 1.6
      
      * add a todo
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      a5594032
  7. 27 Apr, 2021 1 commit
  8. 26 Apr, 2021 4 commits
  9. 23 Apr, 2021 2 commits
  10. 22 Apr, 2021 3 commits
  11. 21 Apr, 2021 2 commits
  12. 20 Apr, 2021 1 commit
  13. 19 Apr, 2021 2 commits
  14. 15 Apr, 2021 3 commits
  15. 14 Apr, 2021 1 commit
  16. 13 Apr, 2021 4 commits
  17. 09 Apr, 2021 1 commit
  18. 08 Apr, 2021 1 commit
  19. 07 Apr, 2021 3 commits