- 23 Feb, 2021 11 commits
-
-
Min Xu authored
* [test]: add peak mem in checkpoint test * more debugging * new test * more fix * better collection of debug in case of future failures * update the comment * typo * comment * clarify * better wording
-
Benjamin Lefaudeux authored
* v0.3.0 it is, celebration time
-
anj-s authored
* move experimental to the fairscale repo * lint error fixes * modify test imports * lint error fixes * lint errors Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* POC, testing against the DDP comm hook when available * docs, adding a reference to DDP's compress hook * updating changelog, prep for v0.1.8 release
-
Myle Ott authored
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
* [bug]: not all CUDA memory is freed when model is deleted * fixed memory leak - without this, peak memory will be high when more than one model is trained (i.e. first model leave staff around pushing up the peak memory when the second model runs) * addressed comments * fix * changelog
-
Min Xu authored
-
Myle Ott authored
Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336 ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper. Compared to PyTorch DDP: * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2 * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3: * all-gather parameters at start of forward pass and start of backward pass * reduce-scatter grads at end of backward pass Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 22 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding an assert + corresponding unit test * updated changelog * adjusting the adascale tests
-
- 19 Feb, 2021 4 commits
-
-
Benjamin Lefaudeux authored
Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
Min Xu authored
* [docs]: add checkpoint_wrapper and many small fixes * update copyright year
-
Benjamin Lefaudeux authored
* test with and without buckets for all the shardedDDP unit tests * parametrize all the things * refactoring, adding even more combinations at times * handle hosts not having cuda
-
Min Xu authored
-
- 18 Feb, 2021 3 commits
-
-
Min Xu authored
* [fix] expose checkpoint_wrapper * fix formatting
-
Benjamin Lefaudeux authored
* Adding multiple groups support to ShardedDDP + unit test * adding gloo to the backends tested for multiple groups
-
Benjamin Lefaudeux authored
* [fix] ShardedDDP train/eval modes * Update CHANGELOG.md
-
- 17 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* initial implementation, with unit test and assert * added changelog and better debug string
-
- 14 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* WIP, needs to be fixed ! * should be a fix, many thanks Weiyi Zheng * slightly better unit test, sorting the states on the way out * reproducing the issue from Weiyi in a unit test, and finally properly fixing * fixing unit test on pytorch1.5 - original loss diff 26.404895782470703 - 26.404342651367188
-
- 12 Feb, 2021 3 commits
-
-
Benjamin Lefaudeux authored
This reverts commit 8be9d930.
-
Benjamin Lefaudeux authored
* many thanks Weiyi Zheng
-
Benjamin Lefaudeux authored
* Better unit testing * Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time * Enabling accumulation tests
-
- 11 Feb, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* super minor, opportunistic micro optim
-
Benjamin Lefaudeux authored
* v0.1.6
-
- 10 Feb, 2021 2 commits
-
-
Myle Ott authored
* Add fairscale.utils.containers Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Add fairscale.nn.misc.checkpoint_activations Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
Leonard Lausen authored
-
- 09 Feb, 2021 1 commit
-
-
msbaines authored
-
- 08 Feb, 2021 2 commits
-
-
msbaines authored
-
Benjamin Lefaudeux authored
* flat params all along, way simpler * updating the docstring
-
- 05 Feb, 2021 2 commits
-
-
Benjamin Lefaudeux authored
fix a broken earlier commit, only worked for the first step
-
Benjamin Lefaudeux authored
* minor * minor
-
- 04 Feb, 2021 6 commits
-
-
msbaines authored
-
Benjamin Lefaudeux authored
cache this iterator, easy speed up
-
Benjamin Lefaudeux authored
* Adding a proper ddp parity / AMP unit test, overdue * catch non-AMP pytorch
-
msbaines authored
-
msbaines authored
It is not currently being used so we can simplify the interface by removing it.
-
msbaines authored
-
- 03 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* restoring the regression test, adding a test of the for_each optims * fix the regression test on circleci * removing unused flags
-