1. 14 Jun, 2022 1 commit
    • Riyasat Ohib's avatar
      Addition of wgit add and wgit commit functionalities. Includes refactors and new classes. (#1002) · c506e7ed
      Riyasat Ohib authored
      * [feat] Adds the implementaion for the wgit add functionality, with sha1 hash creation, reference tracking, dependency graph creation and all related functionalities for the wgit add method.
      
      * [feat] Adds the wgit add and wgit commit functionalities and major refactors.
      
      1. Adds the wgit add and wgit commit functionalities to the api.
      2. Introduces a new PyGit class that wraps the internal .wgit/.git repo.
      3. Refactors the Repo class in the api, and introduces some methods.
      4. .Refactors all the classes which no longer uses @staticmethods and now uses object istances instead.
      5.  Moved many of the directory path handling code from os.path to pathlib library.
      
      * [Feat] Combines the Repo and Weigit classes. Separate claases into separate modules.
      
      1. Combines the functionalities of the WeiGit and Repo class into a single WeiGitRepo class.
      2. Classes are now separated into their own modules.
      3. Moved some functions and staticmethod to utils.
      4. Adds a range of tests for add and commit functionalities of weigit.
      
      * [fix] adds a new test to the ci_test_list_3
      
      * [fix] test fix
      
      * [fix] test fix
      
      * [Feat] Directory restructuring, type checking and some standardization
      1. Restructured the directory and moved wgit to fairscale/experimental/wgit so that it can be found as a package when pip installed.
      2. Added a range of type checking
      3. Some refactors
      
      * [Feat][Refactor] Directory restructuring, test addition and type checking
      1. Restructed the test directory
      2. Added and modified a few wgit tests.
      3. Added some type checking to the code
      
      * test fix
      
      * "setup fix and repo checking added in cli"
      
      * [Feat] Better initialization and error handling for init and wgit subcommands. Test reorg.
      
      * [refactor] Changes in classes, encapsulation and addition of PyGit test.
      
      * [Feat][Refactor]
      1. Changed some class method arguments for better encapsulation for Sha1_store.
      2. Moved sha1 hash calculation within sha1_store.
      3. Some standardization and code clean up of unnecessary snippets.
      4. Added new tests for the PyGit and Sha1_Store class.
      c506e7ed
  2. 25 May, 2022 1 commit
  3. 06 Dec, 2021 1 commit
    • Freddy Snijder's avatar
      Fix for Key Error that can happen in certain FSDP wrapping scenarios of... · e6acdcc3
      Freddy Snijder authored
      Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) (#881)
      
      * Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876)
      
      * Styling fixes
      
      * Updated the test to be independent of the Huggingface transformers package
      
      * Added test for issue #876
      
      * Small error message fix
      
      * Skip test when CUDA is not available
      
      * Fixed naming of model
      e6acdcc3
  4. 18 Nov, 2021 1 commit
  5. 08 Nov, 2021 1 commit
  6. 07 May, 2021 1 commit
  7. 31 Mar, 2021 1 commit
    • Min Xu's avatar
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98
      Min Xu authored
      [fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)
      
      * [fix] disable single rank process group for auto_wrap_bn
      
      - beefed up unit test with regnet-like model
      - found that single-rank process group is causing problem
      - disabled it to enable convergence tests on the vissl side
      - use `raise e from None` to get a better assertion output
        in testing.py.
      
      * [test] fix regnet test for ddp+mixed_precision
      
      - need AMP context in FSDP
      - workaround different between ddp & fsdp when bias=True
      - fixed a bug in input data generation that caused different ranks have
        the same data with wrong iteration count.
      - added TODO for need a better loss and grad_scaler and reduced
        iters so there is no nan.
      - added a (disabled) debugging code
      
      * lint
      
      * lint
      
      * add scaler
      
      * lint
      
      * scaler
      
      * add a real loss
      
      * seeding in the ranks
      
      * blance tests
      
      * run AMP DDP==FSDP test only on cuda version 11 and up
      
      * add relu inplace and comment
      
      * make wrap_bn covers more cases in full precision mode
      a0458b98
  8. 25 Mar, 2021 1 commit
  9. 18 Mar, 2021 1 commit
    • Min Xu's avatar
      [feature] FSDP: enable pytorch SyncBN (#527) · 2fc1f6d8
      Min Xu authored
      * [feature] FSDP: enable pytorch SyncBN
      
      - not fully validated yet but at least not asserting
      - this enables VISSL to move forward with its next PR
      
      * add the test file
      
      * changelog and lint
      
      * addressed comment
      2fc1f6d8
  10. 12 Mar, 2021 1 commit
  11. 08 Mar, 2021 1 commit
    • Min Xu's avatar
      [fix]: handle inputs with containers in mixed precision (#486) · 2e9a14e7
      Min Xu authored
      * [fix]: handle inputs with containers
      
      - this is an issue surfaces by vissl as well
      - fix seems to be super simple
      - also cleaned up two tests with respect to multiple such tests
        running back to back (they don't do that presently)
      
      * cleanup
      
      * fix
      
      * lint
      2e9a14e7
  12. 06 Mar, 2021 1 commit
  13. 05 Mar, 2021 1 commit
  14. 04 Mar, 2021 1 commit
    • Min Xu's avatar
      [feat]: checkpoint and normalization (#457) · 5e64d6a7
      Min Xu authored
      * [feat]: checkpoint and normalization
      
      - added special handling of BN for track_running_stats and checkpointing
      - we test BN/LN and checkpointing
      - we test them with mixed precision
      5e64d6a7
  15. 01 Mar, 2021 1 commit
    • Min Xu's avatar
      [chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7
      Min Xu authored
      * [chores]: CI py39 on GPU and more efficiency
      
      * add test list files
      
      * fix
      
      * add test list files
      
      * split benchmark run into 2 runs
      
      * fix 1.8 version and balance benchmarks
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * recording tests
      
      * py39 install fix
      
      * test again
      
      * move tests
      
      * reorg tests
      
      * skip tests for torch 1.8 due to an upstream bug
      
      * removed __init__.py from tests since it confuses pytest
      
      * Revert "removed __init__.py from tests since it confuses pytest"
      
      This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.
      
      * don't include __init__ in file list
      
      * notes on __init__.py and added missing ones
      
      * fixed mypy in a test file
      
      * balance test runtime
      
      * better pip install
      
      * balance more
      
      * pip fix
      
      * balance
      
      * balance more, all test should finish within 20m now
      
      * minor license update
      
      * trying cu102
      
      * more doc and addressed Ben's comments
      
      * debugging
      
      * debugging
      
      * better capture the errors
      
      * debugging
      
      * fix pyenv command
      
      * add universe repo
      
      * update to cuda 11 for 171
      
      * add a test file, improved the checking script
      5eb6b8c7