- 31 Mar, 2021 4 commits
-
-
msbaines authored
-
anj-s authored
* renaming/adding error messages * address comments * address comments * add more comments * add more comments
-
Min Xu authored
[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556) * [fix] disable single rank process group for auto_wrap_bn - beefed up unit test with regnet-like model - found that single-rank process group is causing problem - disabled it to enable convergence tests on the vissl side - use `raise e from None` to get a better assertion output in testing.py. * [test] fix regnet test for ddp+mixed_precision - need AMP context in FSDP - workaround different between ddp & fsdp when bias=True - fixed a bug in input data generation that caused different ranks have the same data with wrong iteration count. - added TODO for need a better loss and grad_scaler and reduced iters so there is no nan. - added a (disabled) debugging code * lint * lint * add scaler * lint * scaler * add a real loss * seeding in the ranks * blance tests * run AMP DDP==FSDP test only on cuda version 11 and up * add relu inplace and comment * make wrap_bn covers more cases in full precision mode
-
msbaines authored
-
- 30 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* survive the model being moved to device post-construction * make sure that a unit test would catch a regression
-
- 29 Mar, 2021 1 commit
-
-
msbaines authored
-
- 28 Mar, 2021 1 commit
-
-
msbaines authored
-
- 26 Mar, 2021 1 commit
-
-
Min Xu authored
- added DDP equivalency test - added rmf, state_dict_norm functions to testing utils - added more debugging output to objects_are_equal
-
- 25 Mar, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* re-activating unit test * removing changed that slipped in
-
Sam Shleifer authored
Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 22 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 20 Mar, 2021 1 commit
-
-
Myle Ott authored
* Add new test for weight init (fails) * Set FSDP.compute_device so summon_full_params works before module moves to CUDA * Override FSDP.apply to enable custom weight init
-
- 19 Mar, 2021 3 commits
-
-
Benjamin Lefaudeux authored
* param buckets * unifying the buckets
-
msbaines authored
-
msbaines authored
-
- 18 Mar, 2021 5 commits
-
-
Benjamin Lefaudeux authored
* extracting the buckets in a dedicated class, fixing the resize_ bug * adding a unit test * copyright
-
Benjamin Lefaudeux authored
* enabling disabled tests
-
Min Xu authored
* [feat] FSDP: add auto_wrap_bn - add an utility function to handle wrapping of BN * changelog
-
Min Xu authored
* [feature] FSDP: enable pytorch SyncBN - not fully validated yet but at least not asserting - this enables VISSL to move forward with its next PR * add the test file * changelog and lint * addressed comment
-
Benjamin Lefaudeux authored
-
- 17 Mar, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* Deactivating buckets for a single rank, not crashing but not useful
-
Benjamin Lefaudeux authored
-
- 15 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* extending the current state_dict interface, make it possible to do everything in a single call, and to checkpoint on all ranks
-
- 12 Mar, 2021 2 commits
-
-
Min Xu authored
* FSDP: multi-pass autograd graph and mixed precision - added BACKWARD_PRE/POST checking - better assert_state - fixed issue of backward hook misfiring * fix * cleanup * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Myle Ott <myleott@fb.com> Co-authored-by:
Myle Ott <myleott@fb.com>
-
msbaines authored
-
- 11 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene * adding a proper unit test * adding a unit test for https://github.com/facebookresearch/fairscale/pull/510
-
- 09 Mar, 2021 3 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Benjamin Lefaudeux authored
-
- 08 Mar, 2021 2 commits
-
-
Sean Naren authored
* Fix packed sequence apply * Update fairscale/utils/containers.py Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
Min Xu authored
* [fix]: handle inputs with containers - this is an issue surfaces by vissl as well - fix seems to be super simple - also cleaned up two tests with respect to multiple such tests running back to back (they don't do that presently) * cleanup * fix * lint
-
- 06 Mar, 2021 1 commit
-
-
Myle Ott authored
-
- 05 Mar, 2021 3 commits
-
-
Min Xu authored
* [refactor] enhance wrap and auto_wrap - Two things were done in this PR 1. We don't need to import FSDP in wrap.py since the wrapper class type is stored in the context now. 2. We can use a `auto_wrap_policy` function to customize wrapping policy for auto_wrap, including size of module, blacklist, exclude list - The auto_wrap function got simplified a bit as a minor side effect. * Update fairscale/nn/wrap/auto_wrap.py Co-authored-by:Sean Naren <sean@grid.ai> * addressed comments * addressed more comments Co-authored-by:
Sean Naren <sean@grid.ai>
-
Benjamin Lefaudeux authored
* [perf][minor] cache the rank lookups, small shardedddp perf fix * tiny improvement, code quality
-
Benjamin Lefaudeux authored
* change empty shard handling for OSS, do not rely on asserts * code review
-
- 04 Mar, 2021 5 commits
-
-
Min Xu authored
* [feat]: checkpoint and normalization - added special handling of BN for track_running_stats and checkpointing - we test BN/LN and checkpointing - we test them with mixed precision
-
Sam Shleifer authored
-
Siddharth Goyal authored
* Fix ampnet unit test by adding delegate object * Remove comments
-
Min Xu authored
- cover them in terms of code path only - numerically, AdaScale is different on SDP/FSDP than DDP, mainly due to partial view of the gradients. - this doesn't mean it is definitely not useful but it is yet to be validated. - not going to spend too much time until we have a real use case.
-
Min Xu authored
* [chore] move a test script * add a shortcut for installing * more skipping * keep apt-get part
-