- 14 Jun, 2022 1 commit
-
-
Riyasat Ohib authored
* [feat] Adds the implementaion for the wgit add functionality, with sha1 hash creation, reference tracking, dependency graph creation and all related functionalities for the wgit add method. * [feat] Adds the wgit add and wgit commit functionalities and major refactors. 1. Adds the wgit add and wgit commit functionalities to the api. 2. Introduces a new PyGit class that wraps the internal .wgit/.git repo. 3. Refactors the Repo class in the api, and introduces some methods. 4. .Refactors all the classes which no longer uses @staticmethods and now uses object istances instead. 5. Moved many of the directory path handling code from os.path to pathlib library. * [Feat] Combines the Repo and Weigit classes. Separate claases into separate modules. 1. Combines the functionalities of the WeiGit and Repo class into a single WeiGitRepo class. 2. Classes are now separated into their own modules. 3. Moved some functions and staticmethod to utils. 4. Adds a range of tests for add and commit functionalities of weigit. * [fix] adds a new test to the ci_test_list_3 * [fix] test fix * [fix] test fix * [Feat] Directory restructuring, type checking and some standardization 1. Restructured the directory and moved wgit to fairscale/experimental/wgit so that it can be found as a package when pip installed. 2. Added a range of type checking 3. Some refactors * [Feat][Refactor] Directory restructuring, test addition and type checking 1. Restructed the test directory 2. Added and modified a few wgit tests. 3. Added some type checking to the code * test fix * "setup fix and repo checking added in cli" * [Feat] Better initialization and error handling for init and wgit subcommands. Test reorg. * [refactor] Changes in classes, encapsulation and addition of PyGit test. * [Feat][Refactor] 1. Changed some class method arguments for better encapsulation for Sha1_store. 2. Moved sha1 hash calculation within sha1_store. 3. Some standardization and code clean up of unnecessary snippets. 4. Added new tests for the PyGit and Sha1_Store class.
-
- 25 May, 2022 1 commit
-
-
Riyasat Ohib authored
* [feat] Adding wgit within fairscale/experimental/wgit. * [feat] adding experimental wgit
-
- 06 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) (#881) * Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) * Styling fixes * Updated the test to be independent of the Huggingface transformers package * Added test for issue #876 * Small error message fix * Skip test when CUDA is not available * Fixed naming of model
-
- 18 Nov, 2021 1 commit
-
-
Min Xu authored
* [fix]: fix eval for shared weight FSDP * fixing optim state saving * add changelog * reformat with newer local isort * update test * avoid computing reference state unless we are testing training * added optim_state test * make mypy happy * move tests; maybe we need to CUDA memory related tests in the first of the lists Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 08 Nov, 2021 1 commit
-
-
Benjamin Lefaudeux authored
Add SlowMo Distributed Data Parallel for clusters with slow interconnects Co-authored-by:Vinayak Tantia <tantia.vinayak1@gmail.com>
-
- 07 May, 2021 1 commit
-
-
Min Xu authored
* [test]: add a more general test case - also rebalance the tests a bit * added missing arg * balance * better checking * balance * make test smaller and faster * make ddp results cached and enable sync_bn * clean up * fix tests * changelog * blance * fix * addressing comments Co-authored-by:Min Xu <min.xu@acm.org>
-
- 31 Mar, 2021 1 commit
-
-
Min Xu authored
[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556) * [fix] disable single rank process group for auto_wrap_bn - beefed up unit test with regnet-like model - found that single-rank process group is causing problem - disabled it to enable convergence tests on the vissl side - use `raise e from None` to get a better assertion output in testing.py. * [test] fix regnet test for ddp+mixed_precision - need AMP context in FSDP - workaround different between ddp & fsdp when bias=True - fixed a bug in input data generation that caused different ranks have the same data with wrong iteration count. - added TODO for need a better loss and grad_scaler and reduced iters so there is no nan. - added a (disabled) debugging code * lint * lint * add scaler * lint * scaler * add a real loss * seeding in the ranks * blance tests * run AMP DDP==FSDP test only on cuda version 11 and up * add relu inplace and comment * make wrap_bn covers more cases in full precision mode
-
- 25 Mar, 2021 1 commit
-
-
Sam Shleifer authored
Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 18 Mar, 2021 1 commit
-
-
Min Xu authored
* [feature] FSDP: enable pytorch SyncBN - not fully validated yet but at least not asserting - this enables VISSL to move forward with its next PR * add the test file * changelog and lint * addressed comment
-
- 12 Mar, 2021 1 commit
-
-
Min Xu authored
* FSDP: multi-pass autograd graph and mixed precision - added BACKWARD_PRE/POST checking - better assert_state - fixed issue of backward hook misfiring * fix * cleanup * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Myle Ott <myleott@fb.com> Co-authored-by:
Myle Ott <myleott@fb.com>
-
- 08 Mar, 2021 1 commit
-
-
Min Xu authored
* [fix]: handle inputs with containers - this is an issue surfaces by vissl as well - fix seems to be super simple - also cleaned up two tests with respect to multiple such tests running back to back (they don't do that presently) * cleanup * fix * lint
-
- 06 Mar, 2021 1 commit
-
-
Myle Ott authored
-
- 05 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* [perf][minor] cache the rank lookups, small shardedddp perf fix * tiny improvement, code quality
-
- 04 Mar, 2021 1 commit
-
-
Min Xu authored
* [feat]: checkpoint and normalization - added special handling of BN for track_running_stats and checkpointing - we test BN/LN and checkpointing - we test them with mixed precision
-
- 01 Mar, 2021 1 commit
-
-
Min Xu authored
* [chores]: CI py39 on GPU and more efficiency * add test list files * fix * add test list files * split benchmark run into 2 runs * fix 1.8 version and balance benchmarks * fix * fix * fix * fix * recording tests * py39 install fix * test again * move tests * reorg tests * skip tests for torch 1.8 due to an upstream bug * removed __init__.py from tests since it confuses pytest * Revert "removed __init__.py from tests since it confuses pytest" This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0. * don't include __init__ in file list * notes on __init__.py and added missing ones * fixed mypy in a test file * balance test runtime * better pip install * balance more * pip fix * balance * balance more, all test should finish within 20m now * minor license update * trying cu102 * more doc and addressed Ben's comments * debugging * debugging * better capture the errors * debugging * fix pyenv command * add universe repo * update to cuda 11 for 171 * add a test file, improved the checking script
-