- 06 Sep, 2021 1 commit
-
-
Min Xu authored
[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744) * changelog; mypy; oss cleanup * more broadcast_object cleanup in FSDP * one more mypy fix * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release * update torch version for LTS * minor fixes * update cache key * trying newer gpu VMs * bump the cache * update to gpu.medium, which should be 2 GPUs * update nightly version * add pre-commit instruction * fixed CHANGELOG after merging * updated to newer nightly * retained the older broadcast function for older GPUs for oss.py * fixed a bug * added a comment * fixing a test for pytorch 1.10 * testing a fix * Update fairscale/optim/oss.py * Update CONTRIBUTING.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 27 Jul, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 26 Jun, 2021 1 commit
-
-
Pavel Belevich authored
-
- 08 May, 2021 1 commit
-
-
anj-s authored
* rename and move optim/utils.py * attach the new file
-
- 06 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 05 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* making APIs more private * linting
-
- 04 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 19 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* param buckets * unifying the buckets
-
- 18 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* enabling disabled tests
-
- 17 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 15 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* extending the current state_dict interface, make it possible to do everything in a single call, and to checkpoint on all ranks
-
- 12 Mar, 2021 1 commit
-
-
msbaines authored
-
- 11 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene * adding a proper unit test * adding a unit test for https://github.com/facebookresearch/fairscale/pull/510
-
- 09 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 05 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* change empty shard handling for OSS, do not rely on asserts * code review
-
- 23 Feb, 2021 1 commit
-
-
Myle Ott authored
Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336 ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper. Compared to PyTorch DDP: * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2 * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3: * all-gather parameters at start of forward pass and start of backward pass * reduce-scatter grads at end of backward pass Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 22 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding an assert + corresponding unit test * updated changelog * adjusting the adascale tests
-
- 14 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* WIP, needs to be fixed ! * should be a fix, many thanks Weiyi Zheng * slightly better unit test, sorting the states on the way out * reproducing the issue from Weiyi in a unit test, and finally properly fixing * fixing unit test on pytorch1.5 - original loss diff 26.404895782470703 - 26.404342651367188
-
- 12 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Better unit testing * Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time * Enabling accumulation tests
-
- 05 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
fix a broken earlier commit, only worked for the first step
-
- 03 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* precise skip, only if agent has only cpu
-
- 02 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding a test to prove the inter operability with upstream pytorch * updating the changelog * eager state pruning * pytorch 1.5 compat
-
- 27 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 20 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-
- 08 Jan, 2021 3 commits
-
-
Benjamin Lefaudeux authored
* adding a parity unit test * code review, better testing, use torch defaults and check for the loss, log world size
-
Benjamin Lefaudeux authored
-
Joshua Meier authored
* add additional unit test * support model parallelism in oss
-
- 05 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
- 29 Dec, 2020 1 commit
-
-
Joshua Meier authored
author: Joshua Meier
-
- 22 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* fix, one liner * adjust so that frozen trunks get spread still, even if this should have little consequences * removing dead code, hopeful unit test fix * now with some linting.. * adding a proper unit test case
-
- 06 Dec, 2020 1 commit
-
-
Min Xu authored
-
- 16 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
-
- 06 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 14 Oct, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* fixing the issue wrt Apex, validated with Latte, Classy would need another pass
-
msbaines authored
-
- 08 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* new unit test to catch rank issues in OSS
-
- 15 Sep, 2020 2 commits
-
-
Benjamin Lefaudeux authored
Return either the local or global state when queried, depending on a prior consolidation
-
Benjamin Lefaudeux authored
Make OSS compatible with optimizers which do not support the closure argument
-
- 09 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
-