Commits · 4b5b4d3d373f9af9fd97dc9cf1b74ab3b4826d90 · OpenDAS / fairscale

23 Feb, 2021 11 commits

[test]: add peak mem in checkpoint test (#415) · 4b5b4d3d

Min Xu authored Feb 23, 2021

* [test]: add peak mem in checkpoint test

* more debugging

* new test

* more fix

* better collection of debug in case of future failures

* update the comment

* typo

* comment

* clarify

* better wording

4b5b4d3d

[chore] v0.3.0 (#416) · d64ff250
Benjamin Lefaudeux authored Feb 22, 2021
```
* v0.3.0 it is, celebration time
```
d64ff250

[refactor] Move experimental folder to the fairscale repo (#410) · 045a9743

anj-s authored Feb 22, 2021



* move experimental to the fairscale repo

* lint error fixes

* modify test imports

* lint error fixes

* lint errors
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

045a9743

[hotfix] ShardedDDP fp16 grads, default flipped while testing (#417) · 8fd82858
Benjamin Lefaudeux authored Feb 22, 2021

8fd82858

[perf][ShardedDDP] fp16 gradient reduce (#411) · d52d2186

Benjamin Lefaudeux authored Feb 22, 2021

* POC, testing against the DDP comm hook when available
* docs, adding a reference to DDP's compress hook
* updating changelog, prep for v0.1.8 release

d52d2186

Add ninja to setup_requires (#408) · d10c34e7
Myle Ott authored Feb 22, 2021

d10c34e7
[docs] minor changelog update · 4f2eb1ad
Min Xu authored Feb 22, 2021

4f2eb1ad
[doc] minor formatting of changelog · b6934bf5
Min Xu authored Feb 22, 2021

b6934bf5

[bug]: not all CUDA memory is freed when model is deleted (#412) · e3035933

Min Xu authored Feb 22, 2021

* [bug]: not all CUDA memory is freed when model is deleted

* fixed memory leak

- without this, peak memory will be high when more than one model
  is trained (i.e. first model leave staff around pushing up the
  peak memory when the second model runs)

* addressed comments

* fix

* changelog

e3035933

[docs] fsdp changelog and doc (#414) · 2b15720b
Min Xu authored Feb 22, 2021

2b15720b

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e

22 Feb, 2021 1 commit
- [fix][OSS] adding an assert for empty shards + corresponding unit test (#406) · 279b8024
  Benjamin Lefaudeux authored Feb 22, 2021
```
* adding an assert + corresponding unit test
* updated changelog
* adjusting the adascale tests
```
  279b8024
19 Feb, 2021 4 commits
- [chore] v0.1.7 (#404) · a606e84b
  Benjamin Lefaudeux authored Feb 19, 2021
```
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
```
  a606e84b
- [docs]: add checkpoint_wrapper and many small fixes (#403) · 3f240fbb
  Min Xu authored Feb 19, 2021
```
* [docs]: add checkpoint_wrapper and many small fixes

* update copyright year
```
  3f240fbb
- [feature] Unit test with and without buckets for all ShardedDDP unit tests (#400) · 175fdeb0
  Benjamin Lefaudeux authored Feb 19, 2021
```
* test with and without buckets for all the shardedDDP unit tests
* parametrize all the things
* refactoring, adding even more  combinations at times
* handle hosts not having cuda
```
  175fdeb0
- [bug]: fix a bug on custom smoothing factor (#401) · 4396ef4a
  Min Xu authored Feb 18, 2021
  
  4396ef4a
18 Feb, 2021 3 commits
- [fix] expose checkpoint_wrapper (#399) · 535eb011
  Min Xu authored Feb 18, 2021
```
* [fix] expose checkpoint_wrapper

* fix formatting
```
  535eb011
- [feat][ShardedDDP] Support multiple groups (#394) · 205af8c2
  Benjamin Lefaudeux authored Feb 18, 2021
```
* Adding multiple groups support to ShardedDDP + unit test
* adding gloo to the backends tested for multiple groups
```
  205af8c2
- [fix][minor] ShardedDDP train/eval modes (#393) · ef7146d5
  Benjamin Lefaudeux authored Feb 18, 2021
```
* [fix] ShardedDDP train/eval modes
* Update CHANGELOG.md
```
  ef7146d5
17 Feb, 2021 1 commit
- [feat][ShardedDDP] manual reduce option (#389) · 47042917
  Benjamin Lefaudeux authored Feb 16, 2021
```
* initial implementation, with unit test and assert
* added changelog and better debug string
```
  47042917
14 Feb, 2021 1 commit

[fix] OSS dict load/save fix - better fix than 383 and unit test (#386) · 54bd62d3

Benjamin Lefaudeux authored Feb 13, 2021

* WIP, needs to be fixed !

* should be a fix, many thanks Weiyi Zheng

* slightly better unit test, sorting the states on the way out

* reproducing the issue from Weiyi in a unit test, and finally properly fixing

* fixing unit test on pytorch1.5 - original loss diff 26.404895782470703 - 26.404342651367188

54bd62d3

12 Feb, 2021 3 commits
- Revert "[fix] oss dict load (#383)" (#384) · b666d6a4
  Benjamin Lefaudeux authored Feb 12, 2021
```
This reverts commit 8be9d930.
```
  b666d6a4
- [fix] oss dict load (#383) · 8be9d930
  Benjamin Lefaudeux authored Feb 12, 2021
```
* many thanks Weiyi Zheng
```
  8be9d930
- [feature-fix-refactor][ShardedDDP] Make it possible to change trainability graph on the fly (#369) · 13445c55
  Benjamin Lefaudeux authored Feb 11, 2021
```
* Better unit testing
* Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time
* Enabling accumulation tests
```
  13445c55
11 Feb, 2021 2 commits
- [minor] ShardGradScaler - only wait for the last handle (#382) · 1a636557
  Benjamin Lefaudeux authored Feb 11, 2021
```
* super minor, opportunistic micro optim
```
  1a636557
- [chore] v0.1.6 (#377) · ce9e7e48
  Benjamin Lefaudeux authored Feb 10, 2021
```
* v0.1.6
```
  ce9e7e48
10 Feb, 2021 2 commits

Add fairscale.nn.misc.checkpoint_activations (#376) · c963a72a

Myle Ott authored Feb 10, 2021



* Add fairscale.utils.containers
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Add fairscale.nn.misc.checkpoint_activations
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

c963a72a

[fix] Workaround need for pip --no-build-isolation (#375) · e92e85ce
Leonard Lausen authored Feb 10, 2021

e92e85ce

09 Feb, 2021 1 commit
- [refactor] remove multiprocess dependency on async (#373) · b1b9e0f8
  msbaines authored Feb 08, 2021
  
  b1b9e0f8
08 Feb, 2021 2 commits
- [refactor] AsyncPipe: do not sub-class MultiProcessPipe (#370) · 08c10993
  msbaines authored Feb 08, 2021
  
  08c10993
- [refactor] OSS only use flat buffers (#371) · 77d94861
  Benjamin Lefaudeux authored Feb 07, 2021
```
* flat params all along, way simpler
* updating the docstring
```
  77d94861
05 Feb, 2021 2 commits
- [fix] repro+fix (#365) · 8778fa66
  Benjamin Lefaudeux authored Feb 05, 2021
```
fix a broken earlier commit, only worked for the first step
```
  8778fa66
- [perf] ShardedDDP - small memory use reduction - minor speedup (#366) · 4dc605c9
  Benjamin Lefaudeux authored Feb 04, 2021
```
* minor

* minor
```
  4dc605c9
04 Feb, 2021 6 commits
- [refactor] multiprocess_pipe: remove pipelined_backward (#362) · 42e44149
  msbaines authored Feb 04, 2021
  
  42e44149
- [perf][OSS] Clip grad norm : minor obvious speedup (#363) · 7fdd7ecf
  Benjamin Lefaudeux authored Feb 04, 2021
```
cache this iterator, easy speed up
```
  7fdd7ecf
- [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361) · 5c3ff9bd
  Benjamin Lefaudeux authored Feb 04, 2021
```
* Adding a proper ddp parity / AMP unit test, overdue
* catch non-AMP pytorch
```
  5c3ff9bd
- [refactor] multiprocess_pipe: focus on LazyModule usage (#360) · e3a20fef
  msbaines authored Feb 03, 2021
  
  e3a20fef
- [refactor] multiprocess_pipe: remove retain_graph __init__ param (#358) · d624b81a
  msbaines authored Feb 03, 2021
```
It is not currently being used so we can simplify the interface
by removing it.
```
  d624b81a
- [refactor] multiprocess_pipe: cleanup __init__ (#357) · 39675773
  msbaines authored Feb 03, 2021
  
  39675773
03 Feb, 2021 1 commit
- [feat][minor] OSS Benchmark - regression test + background testing new optims (#352) · de713d1e
  Benjamin Lefaudeux authored Feb 03, 2021
```
* restoring the regression test, adding a test of the for_each optims
* fix the regression test on circleci
* removing unused flags
```
  de713d1e