- 11 Aug, 2022 1 commit
-
-
Min Xu authored
* added a profiling class * no more type ignore after merging main * fixed a int/round bug * add unit tests * skip if no cuda for a test Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 27 Jul, 2022 1 commit
-
-
Riyasat Ohib authored
* [Feat] dense to sst implementation 1. Implementation of dense_to_sst function. 2. calculating the threshold for both the cases of top-k-element and top-k-percentage (fraction) 3. assertions to verify that the top_k_elements is smaller than the numel along the same dim 4. top_k_percent to top-k conversion 5. When calculating SST, now the real part of the complex dense_freq is used instead of the magnitudes. * [Feat, Tests] transform method addition, handling of top_k_element None case 1. Addition of a transform method 2. Adds code to handle the dim=None case for top_k_element * [Feat, Refactor] Reorganizations, new assertions and fixes. 1. XOR for validation that both of topk percent and element are not set, or both simultaneously unset. One and only one is set. 3. Distills topk and percent both to topk using unified helper function . 5. Adds a scatter topk values function to scatter values for SST and in future DST. 6. Validation for percentage range, and ensures k is never 0. 7. Uses config validation, adds config validation for top_k_element > 0 if not None.
-
- 14 Jun, 2022 1 commit
-
-
Riyasat Ohib authored
* [feat] Adds the implementaion for the wgit add functionality, with sha1 hash creation, reference tracking, dependency graph creation and all related functionalities for the wgit add method. * [feat] Adds the wgit add and wgit commit functionalities and major refactors. 1. Adds the wgit add and wgit commit functionalities to the api. 2. Introduces a new PyGit class that wraps the internal .wgit/.git repo. 3. Refactors the Repo class in the api, and introduces some methods. 4. .Refactors all the classes which no longer uses @staticmethods and now uses object istances instead. 5. Moved many of the directory path handling code from os.path to pathlib library. * [Feat] Combines the Repo and Weigit classes. Separate claases into separate modules. 1. Combines the functionalities of the WeiGit and Repo class into a single WeiGitRepo class. 2. Classes are now separated into their own modules. 3. Moved some functions and staticmethod to utils. 4. Adds a range of tests for add and commit functionalities of weigit. * [fix] adds a new test to the ci_test_list_3 * [fix] test fix * [fix] test fix * [Feat] Directory restructuring, type checking and some standardization 1. Restructured the directory and moved wgit to fairscale/experimental/wgit so that it can be found as a package when pip installed. 2. Added a range of type checking 3. Some refactors * [Feat][Refactor] Directory restructuring, test addition and type checking 1. Restructed the test directory 2. Added and modified a few wgit tests. 3. Added some type checking to the code * test fix * "setup fix and repo checking added in cli" * [Feat] Better initialization and error handling for init and wgit subcommands. Test reorg. * [refactor] Changes in classes, encapsulation and addition of PyGit test. * [Feat][Refactor] 1. Changed some class method arguments for better encapsulation for Sha1_store. 2. Moved sha1 hash calculation within sha1_store. 3. Some standardization and code clean up of unnecessary snippets. 4. Added new tests for the PyGit and Sha1_Store class.
-
- 25 May, 2022 1 commit
-
-
Riyasat Ohib authored
* [feat] Adding wgit within fairscale/experimental/wgit. * [feat] adding experimental wgit
-
- 06 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) (#881) * Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) * Styling fixes * Updated the test to be independent of the Huggingface transformers package * Added test for issue #876 * Small error message fix * Skip test when CUDA is not available * Fixed naming of model
-
- 18 Nov, 2021 1 commit
-
-
Min Xu authored
* [fix]: fix eval for shared weight FSDP * fixing optim state saving * add changelog * reformat with newer local isort * update test * avoid computing reference state unless we are testing training * added optim_state test * make mypy happy * move tests; maybe we need to CUDA memory related tests in the first of the lists Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 08 Nov, 2021 1 commit
-
-
Benjamin Lefaudeux authored
Add SlowMo Distributed Data Parallel for clusters with slow interconnects Co-authored-by:Vinayak Tantia <tantia.vinayak1@gmail.com>
-
- 07 May, 2021 1 commit
-
-
Min Xu authored
* [test]: add a more general test case - also rebalance the tests a bit * added missing arg * balance * better checking * balance * make test smaller and faster * make ddp results cached and enable sync_bn * clean up * fix tests * changelog * blance * fix * addressing comments Co-authored-by:Min Xu <min.xu@acm.org>
-
- 31 Mar, 2021 1 commit
-
-
Min Xu authored
[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556) * [fix] disable single rank process group for auto_wrap_bn - beefed up unit test with regnet-like model - found that single-rank process group is causing problem - disabled it to enable convergence tests on the vissl side - use `raise e from None` to get a better assertion output in testing.py. * [test] fix regnet test for ddp+mixed_precision - need AMP context in FSDP - workaround different between ddp & fsdp when bias=True - fixed a bug in input data generation that caused different ranks have the same data with wrong iteration count. - added TODO for need a better loss and grad_scaler and reduced iters so there is no nan. - added a (disabled) debugging code * lint * lint * add scaler * lint * scaler * add a real loss * seeding in the ranks * blance tests * run AMP DDP==FSDP test only on cuda version 11 and up * add relu inplace and comment * make wrap_bn covers more cases in full precision mode
-
- 25 Mar, 2021 1 commit
-
-
Sam Shleifer authored
Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 18 Mar, 2021 1 commit
-
-
Min Xu authored
* [feature] FSDP: enable pytorch SyncBN - not fully validated yet but at least not asserting - this enables VISSL to move forward with its next PR * add the test file * changelog and lint * addressed comment
-
- 12 Mar, 2021 1 commit
-
-
Min Xu authored
* FSDP: multi-pass autograd graph and mixed precision - added BACKWARD_PRE/POST checking - better assert_state - fixed issue of backward hook misfiring * fix * cleanup * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Myle Ott <myleott@fb.com> Co-authored-by:
Myle Ott <myleott@fb.com>
-
- 08 Mar, 2021 1 commit
-
-
Min Xu authored
* [fix]: handle inputs with containers - this is an issue surfaces by vissl as well - fix seems to be super simple - also cleaned up two tests with respect to multiple such tests running back to back (they don't do that presently) * cleanup * fix * lint
-
- 06 Mar, 2021 1 commit
-
-
Myle Ott authored
-
- 05 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* [perf][minor] cache the rank lookups, small shardedddp perf fix * tiny improvement, code quality
-
- 04 Mar, 2021 1 commit
-
-
Min Xu authored
* [feat]: checkpoint and normalization - added special handling of BN for track_running_stats and checkpointing - we test BN/LN and checkpointing - we test them with mixed precision
-
- 01 Mar, 2021 1 commit
-
-
Min Xu authored
* [chores]: CI py39 on GPU and more efficiency * add test list files * fix * add test list files * split benchmark run into 2 runs * fix 1.8 version and balance benchmarks * fix * fix * fix * fix * recording tests * py39 install fix * test again * move tests * reorg tests * skip tests for torch 1.8 due to an upstream bug * removed __init__.py from tests since it confuses pytest * Revert "removed __init__.py from tests since it confuses pytest" This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0. * don't include __init__ in file list * notes on __init__.py and added missing ones * fixed mypy in a test file * balance test runtime * better pip install * balance more * pip fix * balance * balance more, all test should finish within 20m now * minor license update * trying cu102 * more doc and addressed Ben's comments * debugging * debugging * better capture the errors * debugging * fix pyenv command * add universe repo * update to cuda 11 for 171 * add a test file, improved the checking script
-