- 08 Aug, 2022 1 commit
-
-
Min Xu authored
* update examples and comment * fixed issue with fft/ifft only doing the last dim * fixed a int/round bug; fixed tests * add cuda tests * add atol and rtol * skip cuda test correctly Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 19 Jul, 2022 1 commit
-
-
Min Xu authored
* formatting change, no logical change * formatting and name change, no logical change * [refactor] sha1_store's path arg - make sha1_store's path arg directly the path, not its parent - this is because sha1_store is not like a .git or a .wgit dir, which is nested inside another "working" dir. It is simply a store, which is using a given dir. - updated repo and tests as well. * remove a test warning due to deprecated API from torch * [refactor] change how dot_wgit_dir_path is used - it should only be assigned in __init__. - we use it in error checking in the rest APIs. * simplify the init a bit * refactor the sanity check * moved some functions, no code change * [feat] added per-tensor add to the repo * enabled gzip compression on add * fix a unit test * add a note * make sha1 store work on general dict * handle general state_dict from a model, not just a module's one-level OrderedDict * formatting Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 29 Jun, 2022 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 12 Jun, 2022 1 commit
-
-
Crutcher Dunnavant authored
-
- 14 Feb, 2022 1 commit
-
-
Min Xu authored
* update pytest versions * [test] test related changes - upgrade to newer pytorch versions - added function to make test more deterministic on A100 and TF32 - fixed some tests so that they are correctly skipped on a single GPU system * more fixes * formatting overly long lines * format * better test without trigger a warning * fix an optim state bug with newer pytorch - adam optimizer seems to return "step" as a singleton tensor now in the nightly build - this fixes it assumeing non-tensor value can still be loaded back by the optimizer * improve oss.py - use min_loss for regression checking is a bit more reliable - also increased the num epochs from 10 to 12 * small oss.py fix * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 11 Feb, 2022 1 commit
-
-
Min Xu authored
* skipping one more test * formatting * minor fix and copyright header * comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 25 Jan, 2022 1 commit
-
-
Min Xu authored
* [fix] reduce unit test memory * set seed in CI * fix random seed function * giving up CI, //sigh
-
- 12 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* adding pre-commit files * applying pre-commit to all files * adding no-strict-optional argument to mypy in circle ci config * fix typo * updating python versions * [skip ci] remove extra args * adding python 3.9 * [skip ci] set pre-commit version in requirements-dev.txt * set CACHE_VERSION * move linters from circleci to github actions * update python version * update python version in benchmarks_2 * moving to python 3.9.7
-
- 08 Nov, 2021 1 commit
-
-
Benjamin Lefaudeux authored
Add SlowMo Distributed Data Parallel for clusters with slow interconnects Co-authored-by:Vinayak Tantia <tantia.vinayak1@gmail.com>
-
- 20 Oct, 2021 1 commit
-
-
Quentin Duval authored
* [feat] layer memory tracking * [feat] layer memory tracking (add tests in CI) * [feat] layer memory tracking: doc typos * [feat] layer memory tracking: mypy fixes * [feat] layer memory tracking: fixes for FSDP all gather tracking on pytorch 1.9 and above * [feat] layer memory tracking: lint * [feat] layer memory tracking: mypy Co-authored-by:QuentinDuval <QuentinDuval@users.noreply.github.com>
-
- 31 Jul, 2021 1 commit
-
-
Myle Ott authored
* Add test (broken) for gradient accumulation without no_sync context manager * changelog * no_sync to grad_acc renaming for tests * clean up tmp files * support grad acc without no_sync * minor * update changelog * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Better assertion from Sam. Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * lint Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 26 Jun, 2021 1 commit
-
-
Pavel Belevich authored
-
- 08 Jun, 2021 1 commit
-
-
Min Xu authored
* refactoring FlattenParamWrapper - use a FlatParameter class to encapsulate the logic of flattening and expanding into views. - this will make it easier to have multiple groups of flatten parameters * fixed testing context issues for both temp files and temp dirs * fixing test_fsdp_metadata * fix pickling of FlatParameter * fixed test_fsdp_optimizer_utils.py * minor * fix assert * lint * remove nesting from the test * step 1.5: remove the code related unnecessary nesting support in FPW * Update fairscale/nn/misc/flatten_params_wrapper.py Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * address comment Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 17 May, 2021 1 commit
-
-
Quentin Duval authored
* Save FSDP metadata for offline unflattening * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint * Add a unit test to show how to use the function * Code review + improvement of the unit tests * Code review: extract clean_path * Make meta data and consolidation of checkpoint work for flatten_parameter=False * Add new unit test file in CI * Complete changelog and fix mypy issues * Add support for module buffers in the consolidation of sharded checkpoints * Better support for module buffers: save them in the meta data * Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format) * Renaming to make code clearer * Code review: in_temporary_directory rework and typo correction * Renaming Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
QuentinDuval <QuentinDuval@users.noreply.github.com>
-
- 14 May, 2021 1 commit
-
-
Shruti Bhosale authored
* fix saving and loading checkpoints with use_sharded_state=True * mypy fix * better fix of the infinite recursion - we need to specifically call FSDP.state_dict from its local state_dict - added unit test that fails without the fix and works with the fix - fixed mypy for the overloaded functions * make cpu-only fsdp work for state_dict at least Co-authored-by:
Min Xu <min.xu@acm.org> Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Min Xu <m1n@fb.com>
-
- 11 May, 2021 1 commit
-
-
Min Xu authored
* [fix] FSDP forward pass overlap between compute and all-gather - much thanks for @cyanguwa for report and @QuentinDuval for debugging it - a new unit test is added to check for this and ensure we detect issue with overlapping and cpu/gpu blocking wait calls * fix * fix * fix * better assertion outputs * fix format and tune all_gather mb for CI * more tuning with non_flatten * undo an accidental change * tuning all gather mb and del model * Update + fix overlapping test to use patched all_gather w/ delay (#672) * fixing get_cycles_per_ms * add get_smi_memory * update the docstring Co-authored-by:
Min Xu <min.xu@acm.org> Co-authored-by:
Myle Ott <myleott@fb.com>
-
- 05 May, 2021 1 commit
-
-
Min Xu authored
* [fix] add clear_autocast_cache flag - when training in AMP model with weight dtype32, FSDP may need to optionally clear the autocast cache to avoid GPU OOM - this flag is default false, automatically doing it is a future TODO - also added a verbose flag to make print(fsdp_model) a bit shorter - updated the memory test to cover those new code - added a couple of useful functions in parallel.py and testing.py * minor * address comments * format * improve the test Co-authored-by:Min Xu <min.xu@acm.org>
-
- 03 May, 2021 1 commit
-
-
Min Xu authored
* [minor] not creating a temp file on import * address review * Revert "address review" This reverts commit f65eb9bc7f7ea8829b1ac0a369ef9a3e6b56420a. Co-authored-by:Min Xu <min.xu@acm.org>
-
- 26 Apr, 2021 1 commit
-
-
Min Xu authored
* [fix]: let FSDP handle model with multiple forward pass and checkpoint * try CI again * save * save * fixed case with bn * minor * add the new file * minor * added test of a single case, runtime is about 50s * enable all 8 test cases * cleanup * cleanup * skip flatten case with 1.6 and 1.7 * minor Co-authored-by:Min Xu <min.xu@acm.org>
-
- 31 Mar, 2021 1 commit
-
-
Min Xu authored
[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556) * [fix] disable single rank process group for auto_wrap_bn - beefed up unit test with regnet-like model - found that single-rank process group is causing problem - disabled it to enable convergence tests on the vissl side - use `raise e from None` to get a better assertion output in testing.py. * [test] fix regnet test for ddp+mixed_precision - need AMP context in FSDP - workaround different between ddp & fsdp when bias=True - fixed a bug in input data generation that caused different ranks have the same data with wrong iteration count. - added TODO for need a better loss and grad_scaler and reduced iters so there is no nan. - added a (disabled) debugging code * lint * lint * add scaler * lint * scaler * add a real loss * seeding in the ranks * blance tests * run AMP DDP==FSDP test only on cuda version 11 and up * add relu inplace and comment * make wrap_bn covers more cases in full precision mode
-
- 26 Mar, 2021 1 commit
-
-
Min Xu authored
- added DDP equivalency test - added rmf, state_dict_norm functions to testing utils - added more debugging output to objects_are_equal
-
- 19 Mar, 2021 1 commit
-
-
msbaines authored
-
- 12 Mar, 2021 1 commit
-
-
msbaines authored
-
- 11 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene * adding a proper unit test * adding a unit test for https://github.com/facebookresearch/fairscale/pull/510
-
- 04 Mar, 2021 1 commit
-
-
Sam Shleifer authored
-
- 26 Feb, 2021 2 commits
- 25 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* bring back a fix from FSDP, may help a few existing users
-
- 23 Feb, 2021 1 commit
-
-
Myle Ott authored
Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336 ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper. Compared to PyTorch DDP: * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2 * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3: * all-gather parameters at start of forward pass and start of backward pass * reduce-scatter grads at end of backward pass Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 19 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* test with and without buckets for all the shardedDDP unit tests * parametrize all the things * refactoring, adding even more combinations at times * handle hosts not having cuda
-
- 18 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Adding multiple groups support to ShardedDDP + unit test * adding gloo to the backends tested for multiple groups
-
- 12 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Better unit testing * Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time * Enabling accumulation tests
-
- 03 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* precise skip, only if agent has only cpu
-
- 02 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale
-
- 29 Jan, 2021 1 commit
-
-
Min Xu authored
* [test]: test with py39 + torch 1.8 nightly * version fix * more fix * fix version function for nightly version * fix torch_pg build * invalidate cache * separate benchmark requirements * comment * fixed mypy * fixed a test
-
- 21 Jan, 2021 3 commits
-
-
Benjamin Lefaudeux authored
* working around broken mypy
-
Myle Ott authored
-
Myle Ott authored
-
- 20 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* using a global variable to share the init filename across processes
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-