1. 23 Apr, 2021 1 commit
    • shuyingsunshine21's avatar
      [FSDP] relax checking root condition (#620) · d3b86d65
      shuyingsunshine21 authored
      * relax checking root condition
      
      * formatting
      
      * add unittest
      
      * add unittest to ci test list
      
      * isort for import of unittest
      
      * format black .
      
      * move test to list 1
      
      * add skip no cuda
      
      * black and isort
      d3b86d65
  2. 22 Apr, 2021 2 commits
  3. 19 Apr, 2021 1 commit
    • Min Xu's avatar
      FSDP: fixing training with freezing weights (#614) · 24da3b11
      Min Xu authored
      
      
      * FSDP: fixing training with freezing weights
      
      - an assert is changed to catch this case correctly
      - unit test added (based on Quentin's test code) for this case and
        compare DDP and FSDP
      
      fixes: #610
      
      * added test file to list 1
      
      * Use better and simpler code as suggested by Myle
      
      * testing both methods of freezing as well
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      24da3b11
  4. 13 Apr, 2021 3 commits
  5. 08 Apr, 2021 1 commit
  6. 07 Apr, 2021 2 commits
  7. 06 Apr, 2021 1 commit
  8. 04 Apr, 2021 2 commits
  9. 02 Apr, 2021 1 commit
  10. 01 Apr, 2021 1 commit
  11. 31 Mar, 2021 2 commits
    • Min Xu's avatar
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98
      Min Xu authored
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)
      
      * [fix] disable single rank process group for auto_wrap_bn
      
      - beefed up unit test with regnet-like model
      - found that single-rank process group is causing problem
      - disabled it to enable convergence tests on the vissl side
      - use `raise e from None` to get a better assertion output
        in testing.py.
      
      * [test] fix regnet test for ddp+mixed_precision
      
      - need AMP context in FSDP
      - workaround different between ddp & fsdp when bias=True
      - fixed a bug in input data generation that caused different ranks have
        the same data with wrong iteration count.
      - added TODO for need a better loss and grad_scaler and reduced
        iters so there is no nan.
      - added a (disabled) debugging code
      
      * lint
      
      * lint
      
      * add scaler
      
      * lint
      
      * scaler
      
      * add a real loss
      
      * seeding in the ranks
      
      * blance tests
      
      * run AMP DDP==FSDP test only on cuda version 11 and up
      
      * add relu inplace and comment
      
      * make wrap_bn covers more cases in full precision mode
      a0458b98
    • msbaines's avatar
      acb9ef00
  12. 30 Mar, 2021 1 commit
  13. 26 Mar, 2021 1 commit
  14. 25 Mar, 2021 2 commits
  15. 22 Mar, 2021 1 commit
  16. 20 Mar, 2021 1 commit
  17. 19 Mar, 2021 2 commits
  18. 18 Mar, 2021 4 commits
  19. 17 Mar, 2021 1 commit
  20. 12 Mar, 2021 2 commits
  21. 11 Mar, 2021 1 commit
  22. 09 Mar, 2021 2 commits
  23. 08 Mar, 2021 1 commit
    • Min Xu's avatar
      [fix]: handle inputs with containers in mixed precision (#486) · 2e9a14e7
      Min Xu authored
      * [fix]: handle inputs with containers
      
      - this is an issue surfaces by vissl as well
      - fix seems to be super simple
      - also cleaned up two tests with respect to multiple such tests
        running back to back (they don't do that presently)
      
      * cleanup
      
      * fix
      
      * lint
      2e9a14e7
  24. 06 Mar, 2021 1 commit
  25. 05 Mar, 2021 2 commits
  26. 04 Mar, 2021 1 commit
    • Min Xu's avatar
      [feat]: checkpoint and normalization (#457) · 5e64d6a7
      Min Xu authored
      * [feat]: checkpoint and normalization
      
      - added special handling of BN for track_running_stats and checkpointing
      - we test BN/LN and checkpointing
      - we test them with mixed precision
      5e64d6a7