- 03 May, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix + unit test * changelog update
-
- 28 Apr, 2021 2 commits
-
-
msbaines authored
-
Min Xu authored
* [feat] save memory by using bucket buffer only in backward - this fixes bug #627 - added documentation to clarify the buffer's cost and speed/memory tradeoff - added setup/teardown calls so that the buffer is only allocated during the backward pass, saving more memory for forward and stepping so that they can be used for things like activations. - added a unit test that assert the memory is in range. Comparing with DDP: 1. buffer size scales with # of FSDP not model size 2. buffer is only allocated during backward 3. buffer is used for small tensors only to reduce overhead 4. overlapping of compute-reduction is very different * add PR number to changelog * filled in with memory number on 1.9 * addressed comments * update comments * fix for 1.6 * add a todo Co-authored-by:Min Xu <min.xu@acm.org>
-
- 26 Apr, 2021 1 commit
-
-
Min Xu authored
* [chore] 0.3.6 release * try redo the caches Co-authored-by:Min Xu <min.xu@acm.org>
-
- 19 Apr, 2021 1 commit
-
-
Min Xu authored
* [chore] 0.3.5 release * address comment Co-authored-by:Min Xu <min.xu@acm.org>
-
- 13 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 02 Apr, 2021 1 commit
-
-
Min Xu authored
- releasing 0.3.3 - I need it in vissl for the auto_wrap_bn change
-
- 18 Mar, 2021 3 commits
-
-
Min Xu authored
-
Min Xu authored
* [feat] FSDP: add auto_wrap_bn - add an utility function to handle wrapping of BN * changelog
-
Min Xu authored
* [feature] FSDP: enable pytorch SyncBN - not fully validated yet but at least not asserting - this enables VISSL to move forward with its next PR * add the test file * changelog and lint * addressed comment
-
- 12 Mar, 2021 1 commit
-
-
Min Xu authored
* FSDP: multi-pass autograd graph and mixed precision - added BACKWARD_PRE/POST checking - better assert_state - fixed issue of backward hook misfiring * fix * cleanup * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Myle Ott <myleott@fb.com> Co-authored-by:
Myle Ott <myleott@fb.com>
-
- 11 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene * adding a proper unit test * adding a unit test for https://github.com/facebookresearch/fairscale/pull/510
-
- 09 Mar, 2021 1 commit
-
-
Min Xu authored
* [chore] 0.3.1 release - mainly because vissl needs the new version - added a doc on release steps * Update CHANGELOG.md Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com> * review comments Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com>
-
- 25 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* bring back a fix from FSDP, may help a few existing users
-
- 23 Feb, 2021 6 commits
-
-
Benjamin Lefaudeux authored
* v0.3.0 it is, celebration time
-
Benjamin Lefaudeux authored
* POC, testing against the DDP comm hook when available * docs, adding a reference to DDP's compress hook * updating changelog, prep for v0.1.8 release
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
* [bug]: not all CUDA memory is freed when model is deleted * fixed memory leak - without this, peak memory will be high when more than one model is trained (i.e. first model leave staff around pushing up the peak memory when the second model runs) * addressed comments * fix * changelog
-
Min Xu authored
-
- 22 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding an assert + corresponding unit test * updated changelog * adjusting the adascale tests
-
- 19 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 18 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* [fix] ShardedDDP train/eval modes * Update CHANGELOG.md
-
- 17 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* initial implementation, with unit test and assert * added changelog and better debug string
-
- 12 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Better unit testing * Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time * Enabling accumulation tests
-
- 11 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* v0.1.6
-
- 03 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 02 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding a test to prove the inter operability with upstream pytorch * updating the changelog * eager state pruning * pytorch 1.5 compat
-
- 29 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 07 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* trying to fix the missing files in the pip package (not in this diff) * adding a long description, more pypi friendly
-
- 05 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
release pip package to follow suit
-
- 04 Jan, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
Min Xu authored
* [feat] sync adascale from internal repo - tbd testing: tbd * Update argument document of __init__ * update documentation around set_num_gradients_to_accumulate * added checking code for proper API calling places * rename internal APIs to make them internal * updated changelog * added support for add_param_group and its unit test * added unit test for set_num_gradients_to_accumulate * added debias_ewma unit test * fixed test_set_num_gradients_to_accumulate (need zero_grad() call) * added missing zero_grad() to test_lr_scheduler * fixed test_add_param_group with respect to optim.zero_grad() * added test_gradient_value * added test_scale_not_equal_default for scale != world_size * grad_accum * added test_unhook() * removed print statements * fixed a typo * addressed Ben's comment
-
- 30 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* removing a dead call since ShardedDDP, small speedup * unrelated, but filling in the changelog * another nit
-
- 24 Dec, 2020 1 commit
-
-
Min Xu authored
* Update changelog missed this item from previous AdaScale commit. * More change log * Addressed review comments
-
- 03 Dec, 2020 1 commit
-
-
Min Xu authored
* added AdaScale to README * [adascale] added gradient accumulation - added gradient accumulation - tested with cifar full trainings with different value of accumulation and verified the full accuracy is obtained - also removed the patch optimize flag until we need it * [adascale] adding pytest - added basic and ddp tests and grad_accum - closes #195 * added changelog * added ddp grad_accum test * moved ddp and non-ddp tests into separate files * added checkpoint test * more doc * addressed Mike's comments
-
- 02 Dec, 2020 1 commit
-
-
msbaines authored
Fixes #190
-
- 01 Dec, 2020 1 commit
-
-
msbaines authored
-
- 15 Oct, 2020 1 commit
-
-
msbaines authored
-
- 28 Aug, 2020 1 commit
-
-
msbaines authored
-