- 20 Oct, 2021 1 commit
-
-
anj-s authored
* add log for new memory tracker features * add log for new memory tracker features
-
- 20 Sep, 2021 1 commit
-
-
tmarkstrum authored
* [chore]0.4.1 release * put more details in one change log
-
- 13 Sep, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 12 Sep, 2021 1 commit
-
-
Min Xu authored
* add changelog for previous commit * add changelog for previous commit * add changelog for previous commit * fix a merge induced error Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 05 Sep, 2021 1 commit
-
-
Min Xu authored
* [bug] [FSDP] making sure we use full params for multiple backwards within an iteration * changelog Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 12 Aug, 2021 2 commits
-
-
anj-s authored
-
Min Xu authored
* minor: changelog and pre-commit * addressed comment * update the release doc Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 01 Aug, 2021 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 31 Jul, 2021 1 commit
-
-
Myle Ott authored
* Add test (broken) for gradient accumulation without no_sync context manager * changelog * no_sync to grad_acc renaming for tests * clean up tmp files * support grad acc without no_sync * minor * update changelog * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Better assertion from Sam. Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * lint Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 27 Jul, 2021 2 commits
-
-
Min Xu authored
* [chore] 0.3.9 release * update changelog * address comments Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Benjamin Lefaudeux authored
-
- 26 Jul, 2021 1 commit
-
-
Min Xu authored
* [feat] FSDP: supporting multiple flatten parameter groups - step 3: make FSDP use FlattenParamModule unconditionally * fixing the auto_wrap tests * minor * rewrite local_metadata_dict - updated FPW so that custom flat param name is also supported * bug fix * mypy * rewrote consolidate_shard_weights - test_consolidate passes * comments * fixing pickling * Fix shared params and MoE logic (#749) * add strict kwarg to support fairseq:gshard MoE saving logic * Test fairseq style shard * style * formatting and address comments * added changelog * fixing a test after padding renaming Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 12 Jul, 2021 1 commit
-
-
anj-s authored
-
- 21 Jun, 2021 1 commit
-
-
Min Xu authored
* [feat] FSDP: supporting multiple flatten parameter groups - step 2: extending FPW to support multiple flat params groups - FSDP still only use one group - unit test does this the new code paths - updated the changelog * first cut, mypy passed * test_flatten_params_wrapper.py::TestFlattenParams tests pass * added two more test cases and fixed a case in the code * fixed one bug with param_path_infos * fixed two more tests with hardcoded flat_param names * Update CHANGELOG.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 11 Jun, 2021 1 commit
-
-
Pete authored
* add failing test * add fix * use 'torch.is_grad_enabled()' instead of 'module.training' * Revert "add failing test" This reverts commit 1c34242208f9b2c5fa6c8f181434c2be6d7cdbc0. * add simple test * improve test * add check for fwd_counter * revert typing/format changes * move to new test file * CHANGELOG * remove old test * fix import order * fix test to be compat with torch 1.6.0 * clean up * comments * isort
🤦
-
- 01 Jun, 2021 1 commit
-
-
Pete authored
* add failing test for buffer dtype * fix buffer dtype issue * update CHANGELOG * fix
-
- 28 May, 2021 1 commit
-
-
Min Xu authored
* [do not merge] testing a corner case * workaround * using dummy tensor to fix * lint * changelog * update a comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 18 May, 2021 1 commit
-
-
Min Xu authored
* [chore] 0.3.7 release * fixed changelog Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 17 May, 2021 2 commits
-
-
Min Xu authored
* [fix] auto_wrap: support wrapping based on wrapper_config - user can use this to avoid assert if auto_wrap is used multiple times on a module - user can traverse the modules multiple times and assign a wrapper_config to the module and then use auto_wrap once to wrap them fix #649 fix #585 * added changelog * fix tests * fix a test * added an optional assert for collision based on discussions with Quentin * added config_auto_wrap_policy * lint Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Quentin Duval authored
* Save FSDP metadata for offline unflattening * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint * Add a unit test to show how to use the function * Code review + improvement of the unit tests * Code review: extract clean_path * Make meta data and consolidation of checkpoint work for flatten_parameter=False * Add new unit test file in CI * Complete changelog and fix mypy issues * Add support for module buffers in the consolidation of sharded checkpoints * Better support for module buffers: save them in the meta data * Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format) * Renaming to make code clearer * Code review: in_temporary_directory rework and typo correction * Renaming Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
QuentinDuval <QuentinDuval@users.noreply.github.com>
-
- 14 May, 2021 1 commit
-
-
anj-s authored
* api changes * fix list * modify changelog * modify changelog * modify changelog * move function
-
- 13 May, 2021 1 commit
-
-
Min Xu authored
* [fix] add and use get_process_group_cached - This commit makes FSDP avoid making too many process groups by default - Extra process group is bad for GPU memory and init time * add changelog * lint * note on speed * add better assert output test seems to be flaky: https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps * update test reference memory values - With cached process groups, the memory is reduced as reported by pytorch as well (due to bucket buffer memory for the reduction buffer) - The effect on memory is actually more on the SMI memory, which is not reported by pytorch and checked by this test. * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py * Update CHANGELOG.md * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * improved changelog * better handling of underscores in the md file Co-authored-by:
Min Xu <min.xu@acm.org>
-
- 12 May, 2021 1 commit
-
-
anj-s authored
* rename files * add newly renamed file * rename and move checkpoint activations related files * add test files to ci list * fix lint errors * modify docs * add changelog * retain old path for now * fix lint errors * add another import test case * fix merge conflict * add missing test file
-
- 11 May, 2021 1 commit
-
-
Min Xu authored
* [fix] FSDP forward pass overlap between compute and all-gather - much thanks for @cyanguwa for report and @QuentinDuval for debugging it - a new unit test is added to check for this and ensure we detect issue with overlapping and cpu/gpu blocking wait calls * fix * fix * fix * better assertion outputs * fix format and tune all_gather mb for CI * more tuning with non_flatten * undo an accidental change * tuning all gather mb and del model * Update + fix overlapping test to use patched all_gather w/ delay (#672) * fixing get_cycles_per_ms * add get_smi_memory * update the docstring Co-authored-by:
Min Xu <min.xu@acm.org> Co-authored-by:
Myle Ott <myleott@fb.com>
-
- 07 May, 2021 1 commit
-
-
Min Xu authored
* [test]: add a more general test case - also rebalance the tests a bit * added missing arg * balance * better checking * balance * make test smaller and faster * make ddp results cached and enable sync_bn * clean up * fix tests * changelog * blance * fix * addressing comments Co-authored-by:Min Xu <min.xu@acm.org>
-
- 05 May, 2021 2 commits
-
-
Min Xu authored
* [fix] better assert and better test for frozen weights - the precise condition should have been check m.parameters(), not m.params. - fixes #643 * add changelog * use enum is so much better Co-authored-by:Min Xu <min.xu@acm.org>
-
Min Xu authored
* [fix] add clear_autocast_cache flag - when training in AMP model with weight dtype32, FSDP may need to optionally clear the autocast cache to avoid GPU OOM - this flag is default false, automatically doing it is a future TODO - also added a verbose flag to make print(fsdp_model) a bit shorter - updated the memory test to cover those new code - added a couple of useful functions in parallel.py and testing.py * minor * address comments * format * improve the test Co-authored-by:Min Xu <min.xu@acm.org>
-
- 03 May, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix + unit test * changelog update
-
- 28 Apr, 2021 2 commits
-
-
msbaines authored
-
Min Xu authored
* [feat] save memory by using bucket buffer only in backward - this fixes bug #627 - added documentation to clarify the buffer's cost and speed/memory tradeoff - added setup/teardown calls so that the buffer is only allocated during the backward pass, saving more memory for forward and stepping so that they can be used for things like activations. - added a unit test that assert the memory is in range. Comparing with DDP: 1. buffer size scales with # of FSDP not model size 2. buffer is only allocated during backward 3. buffer is used for small tensors only to reduce overhead 4. overlapping of compute-reduction is very different * add PR number to changelog * filled in with memory number on 1.9 * addressed comments * update comments * fix for 1.6 * add a todo Co-authored-by:Min Xu <min.xu@acm.org>
-
- 26 Apr, 2021 1 commit
-
-
Min Xu authored
* [chore] 0.3.6 release * try redo the caches Co-authored-by:Min Xu <min.xu@acm.org>
-
- 19 Apr, 2021 1 commit
-
-
Min Xu authored
* [chore] 0.3.5 release * address comment Co-authored-by:Min Xu <min.xu@acm.org>
-
- 13 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 02 Apr, 2021 1 commit
-
-
Min Xu authored
- releasing 0.3.3 - I need it in vissl for the auto_wrap_bn change
-
- 18 Mar, 2021 3 commits
-
-
Min Xu authored
-
Min Xu authored
* [feat] FSDP: add auto_wrap_bn - add an utility function to handle wrapping of BN * changelog
-
Min Xu authored
* [feature] FSDP: enable pytorch SyncBN - not fully validated yet but at least not asserting - this enables VISSL to move forward with its next PR * add the test file * changelog and lint * addressed comment
-
- 12 Mar, 2021 1 commit
-
-
Min Xu authored
* FSDP: multi-pass autograd graph and mixed precision - added BACKWARD_PRE/POST checking - better assert_state - fixed issue of backward hook misfiring * fix * cleanup * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Myle Ott <myleott@fb.com> Co-authored-by:
Myle Ott <myleott@fb.com>
-
- 11 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene * adding a proper unit test * adding a unit test for https://github.com/facebookresearch/fairscale/pull/510
-
- 09 Mar, 2021 1 commit
-
-
Min Xu authored
* [chore] 0.3.1 release - mainly because vissl needs the new version - added a doc on release steps * Update CHANGELOG.md Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com> * review comments Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com>
-