- 04 Apr, 2021 1 commit
-
-
msbaines authored
This test is flaky for torch >= 1.8.0.
-
- 02 Apr, 2021 1 commit
-
-
msbaines authored
NCCL all_to_all is now supported in PyTorch (since v1.8.0) Fixes: #548
-
- 01 Apr, 2021 1 commit
-
-
msbaines authored
-
- 31 Mar, 2021 2 commits
-
-
Min Xu authored
[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556) * [fix] disable single rank process group for auto_wrap_bn - beefed up unit test with regnet-like model - found that single-rank process group is causing problem - disabled it to enable convergence tests on the vissl side - use `raise e from None` to get a better assertion output in testing.py. * [test] fix regnet test for ddp+mixed_precision - need AMP context in FSDP - workaround different between ddp & fsdp when bias=True - fixed a bug in input data generation that caused different ranks have the same data with wrong iteration count. - added TODO for need a better loss and grad_scaler and reduced iters so there is no nan. - added a (disabled) debugging code * lint * lint * add scaler * lint * scaler * add a real loss * seeding in the ranks * blance tests * run AMP DDP==FSDP test only on cuda version 11 and up * add relu inplace and comment * make wrap_bn covers more cases in full precision mode
-
msbaines authored
-
- 30 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* survive the model being moved to device post-construction * make sure that a unit test would catch a regression
-
- 26 Mar, 2021 1 commit
-
-
Min Xu authored
- added DDP equivalency test - added rmf, state_dict_norm functions to testing utils - added more debugging output to objects_are_equal
-
- 25 Mar, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* re-activating unit test * removing changed that slipped in
-
Sam Shleifer authored
Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 22 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 20 Mar, 2021 1 commit
-
-
Myle Ott authored
* Add new test for weight init (fails) * Set FSDP.compute_device so summon_full_params works before module moves to CUDA * Override FSDP.apply to enable custom weight init
-
- 19 Mar, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* param buckets * unifying the buckets
-
msbaines authored
-
- 18 Mar, 2021 4 commits
-
-
Benjamin Lefaudeux authored
* extracting the buckets in a dedicated class, fixing the resize_ bug * adding a unit test * copyright
-
Min Xu authored
* [feat] FSDP: add auto_wrap_bn - add an utility function to handle wrapping of BN * changelog
-
Min Xu authored
* [feature] FSDP: enable pytorch SyncBN - not fully validated yet but at least not asserting - this enables VISSL to move forward with its next PR * add the test file * changelog and lint * addressed comment
-
Benjamin Lefaudeux authored
-
- 17 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Deactivating buckets for a single rank, not crashing but not useful
-
- 12 Mar, 2021 2 commits
-
-
Min Xu authored
* FSDP: multi-pass autograd graph and mixed precision - added BACKWARD_PRE/POST checking - better assert_state - fixed issue of backward hook misfiring * fix * cleanup * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Myle Ott <myleott@fb.com> Co-authored-by:
Myle Ott <myleott@fb.com>
-
msbaines authored
-
- 11 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene * adding a proper unit test * adding a unit test for https://github.com/facebookresearch/fairscale/pull/510
-
- 09 Mar, 2021 2 commits
- 08 Mar, 2021 1 commit
-
-
Min Xu authored
* [fix]: handle inputs with containers - this is an issue surfaces by vissl as well - fix seems to be super simple - also cleaned up two tests with respect to multiple such tests running back to back (they don't do that presently) * cleanup * fix * lint
-
- 06 Mar, 2021 1 commit
-
-
Myle Ott authored
-
- 05 Mar, 2021 2 commits
-
-
Min Xu authored
* [refactor] enhance wrap and auto_wrap - Two things were done in this PR 1. We don't need to import FSDP in wrap.py since the wrapper class type is stored in the context now. 2. We can use a `auto_wrap_policy` function to customize wrapping policy for auto_wrap, including size of module, blacklist, exclude list - The auto_wrap function got simplified a bit as a minor side effect. * Update fairscale/nn/wrap/auto_wrap.py Co-authored-by:Sean Naren <sean@grid.ai> * addressed comments * addressed more comments Co-authored-by:
Sean Naren <sean@grid.ai>
-
Benjamin Lefaudeux authored
* [perf][minor] cache the rank lookups, small shardedddp perf fix * tiny improvement, code quality
-
- 04 Mar, 2021 2 commits
-
-
Min Xu authored
* [feat]: checkpoint and normalization - added special handling of BN for track_running_stats and checkpointing - we test BN/LN and checkpointing - we test them with mixed precision
-
Sam Shleifer authored
-
- 03 Mar, 2021 2 commits
- 02 Mar, 2021 2 commits
-
-
Myle Ott authored
-
Sean Naren authored
This adds a context manager that assists in making child modules with similar defaults. Usage: ``` from fairscale.nn.misc import enable_wrap, wrap with enable_wrap(**handleful_of_important_params): layer_1 = wrap(torch.nn.Linear(5, 5)) layer_2 = wrap(torch.nn.Linear(5, 5), flatten_parameters=True) # Override parameters if you'd like # without the context manager, creates Linear layer layer_1 = wrap(torch.nn.Linear(5, 5)) ``` If not within the FSDP context, this would be a no-op. This makes it easier to annotate layers without having to copy any changes in parameters.
-
- 01 Mar, 2021 2 commits
-
-
Min Xu authored
* [chores]: CI py39 on GPU and more efficiency * add test list files * fix * add test list files * split benchmark run into 2 runs * fix 1.8 version and balance benchmarks * fix * fix * fix * fix * recording tests * py39 install fix * test again * move tests * reorg tests * skip tests for torch 1.8 due to an upstream bug * removed __init__.py from tests since it confuses pytest * Revert "removed __init__.py from tests since it confuses pytest" This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0. * don't include __init__ in file list * notes on __init__.py and added missing ones * fixed mypy in a test file * balance test runtime * better pip install * balance more * pip fix * balance * balance more, all test should finish within 20m now * minor license update * trying cu102 * more doc and addressed Ben's comments * debugging * debugging...
-
Min Xu authored
* [test] FSDP: add the failing test for #421 * skip on 1.5 * better skipping * Update tests/nn/data_parallel/test_fsdp_grad_scaler.py Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 27 Feb, 2021 1 commit
-
-
Min Xu authored
* [fix] FSDP corner case of all params at in the children * lint * fix * tradeoff * fix doc build * review comments
-
- 26 Feb, 2021 3 commits
- 25 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* bring back a fix from FSDP, may help a few existing users
-