1. 15 Nov, 2021 1 commit
    • Anupam Bhatnagar's avatar
      Allow sharded grad scaler to cpu offload with FSDP (#831) · ba5785f7
      Anupam Bhatnagar authored
      * first commit
      
      * sharded scaler hitting nan assertions
      
      * adding test for sharded grad scaler without cpu offload
      
      * ddp grad scaler and fsdp sharded grad scaler test failing
      
      * removing test_output
      
      * fix no cpu offload test
      
      * changing optimizer from OSS to SGD
      
      * all tests passing, code cleanup pending
      
      * code cleanup
      
      * fix pyproject.toml
      
      * removing .isort.cfg
      
      * running isort linter
      
      * resolving isort issues
      
      * resolving black linter issue
      
      * resolving mypy issues
      
      * fix import statement
      
      * fix mypy error
      
      * modifying import statement
      
      * adding pytorch version requirement
      
      * fixing pytest skip test decorator
      
      * apply version guard for ShardedGradScaler
      
      * removing test_fsdp_grad_scaler
      
      * increasing num_epochs for ShardedGradScaler so that updates are not skipped
      
      * adding support for torch 1.8
      
      * minor edit
      
      * [skip ci] more torch 1.8 changes
      
      * parametrizing the tests
      
      * cleanup code with linters
      
      * [skip ci] update doc string
      
      * [skip ci] addressing some more comments
      ba5785f7
  2. 12 Nov, 2021 1 commit
    • Anupam Bhatnagar's avatar
      Setup pre-commit github action and apply pre-commit to all files (#849) · 7d7edf6d
      Anupam Bhatnagar authored
      * adding pre-commit files
      
      * applying pre-commit to all files
      
      * adding no-strict-optional argument to mypy in circle ci config
      
      * fix typo
      
      * updating python versions
      
      * [skip ci] remove extra args
      
      * adding python 3.9
      
      * [skip ci] set pre-commit version in requirements-dev.txt
      
      * set CACHE_VERSION
      
      * move linters from circleci to github actions
      
      * update python version
      
      * update python version in benchmarks_2
      
      * moving to python 3.9.7
      7d7edf6d
  3. 26 Jun, 2021 1 commit
  4. 28 Apr, 2021 1 commit
    • Min Xu's avatar
      [test] improve BN test coverage (#638) · 21cba91b
      Min Xu authored
      
      
      * [test] improve BN test coverage
      
      - Added sync_bn on/off cases
      - Added conv and linear bias on/off cases
      - clarified when sync_bn is off, when is BN wrapping needed with the test
      
      * adding a comment
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      21cba91b
  5. 22 Apr, 2021 1 commit
    • Min Xu's avatar
      [fix] mypy and flaky test (#624) · 961df76e
      Min Xu authored
      
      
      * [fix] mypy and flaky test
      
      - CI didn't seem to catch this or maybe I merged incorrectly yesterday
      - this should fix the mypy error on master
      - also updated a test that seems to be flaky due to tcp port conflict
      
      * another flaky test, hopefully more determinism helps
      
      * CR
      
      * skip 1.6
      
      * fix
      
      * minor
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      961df76e
  6. 31 Mar, 2021 1 commit
    • Min Xu's avatar
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98
      Min Xu authored
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)
      
      * [fix] disable single rank process group for auto_wrap_bn
      
      - beefed up unit test with regnet-like model
      - found that single-rank process group is causing problem
      - disabled it to enable convergence tests on the vissl side
      - use `raise e from None` to get a better assertion output
        in testing.py.
      
      * [test] fix regnet test for ddp+mixed_precision
      
      - need AMP context in FSDP
      - workaround different between ddp & fsdp when bias=True
      - fixed a bug in input data generation that caused different ranks have
        the same data with wrong iteration count.
      - added TODO for need a better loss and grad_scaler and reduced
        iters so there is no nan.
      - added a (disabled) debugging code
      
      * lint
      
      * lint
      
      * add scaler
      
      * lint
      
      * scaler
      
      * add a real loss
      
      * seeding in the ranks
      
      * blance tests
      
      * run AMP DDP==FSDP test only on cuda version 11 and up
      
      * add relu inplace and comment
      
      * make wrap_bn covers more cases in full precision mode
      a0458b98
  7. 26 Mar, 2021 1 commit
  8. 18 Mar, 2021 2 commits
    • Min Xu's avatar
      [feat] FSDP: add auto_wrap_bn (#531) · 8b59267b
      Min Xu authored
      * [feat] FSDP: add auto_wrap_bn
      
      - add an utility function to handle wrapping of BN
      
      * changelog
      8b59267b
    • Min Xu's avatar
      [feature] FSDP: enable pytorch SyncBN (#527) · 2fc1f6d8
      Min Xu authored
      * [feature] FSDP: enable pytorch SyncBN
      
      - not fully validated yet but at least not asserting
      - this enables VISSL to move forward with its next PR
      
      * add the test file
      
      * changelog and lint
      
      * addressed comment
      2fc1f6d8
  9. 12 Mar, 2021 1 commit