Commits · 2b15720bb00e1b3502b22313c500f51c5e14397a · OpenDAS / fairscale

23 Feb, 2021 2 commits

[docs] fsdp changelog and doc (#414) · 2b15720b
Min Xu authored Feb 22, 2021

2b15720b

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e

22 Feb, 2021 1 commit
- [fix][OSS] adding an assert for empty shards + corresponding unit test (#406) · 279b8024
  Benjamin Lefaudeux authored Feb 22, 2021
```
* adding an assert + corresponding unit test
* updated changelog
* adjusting the adascale tests
```
  279b8024
19 Feb, 2021 4 commits
- [chore] v0.1.7 (#404) · a606e84b
  Benjamin Lefaudeux authored Feb 19, 2021
```
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
```
  a606e84b
- [docs]: add checkpoint_wrapper and many small fixes (#403) · 3f240fbb
  Min Xu authored Feb 19, 2021
```
* [docs]: add checkpoint_wrapper and many small fixes

* update copyright year
```
  3f240fbb
- [feature] Unit test with and without buckets for all ShardedDDP unit tests (#400) · 175fdeb0
  Benjamin Lefaudeux authored Feb 19, 2021
```
* test with and without buckets for all the shardedDDP unit tests
* parametrize all the things
* refactoring, adding even more  combinations at times
* handle hosts not having cuda
```
  175fdeb0
- [bug]: fix a bug on custom smoothing factor (#401) · 4396ef4a
  Min Xu authored Feb 18, 2021
  
  4396ef4a
18 Feb, 2021 3 commits
- [fix] expose checkpoint_wrapper (#399) · 535eb011
  Min Xu authored Feb 18, 2021
```
* [fix] expose checkpoint_wrapper

* fix formatting
```
  535eb011
- [feat][ShardedDDP] Support multiple groups (#394) · 205af8c2
  Benjamin Lefaudeux authored Feb 18, 2021
```
* Adding multiple groups support to ShardedDDP + unit test
* adding gloo to the backends tested for multiple groups
```
  205af8c2
- [fix][minor] ShardedDDP train/eval modes (#393) · ef7146d5
  Benjamin Lefaudeux authored Feb 18, 2021
```
* [fix] ShardedDDP train/eval modes
* Update CHANGELOG.md
```
  ef7146d5
17 Feb, 2021 1 commit
- [feat][ShardedDDP] manual reduce option (#389) · 47042917
  Benjamin Lefaudeux authored Feb 16, 2021
```
* initial implementation, with unit test and assert
* added changelog and better debug string
```
  47042917
14 Feb, 2021 1 commit

[fix] OSS dict load/save fix - better fix than 383 and unit test (#386) · 54bd62d3

Benjamin Lefaudeux authored Feb 13, 2021

* WIP, needs to be fixed !

* should be a fix, many thanks Weiyi Zheng

* slightly better unit test, sorting the states on the way out

* reproducing the issue from Weiyi in a unit test, and finally properly fixing

* fixing unit test on pytorch1.5 - original loss diff 26.404895782470703 - 26.404342651367188

54bd62d3

12 Feb, 2021 3 commits
- Revert "[fix] oss dict load (#383)" (#384) · b666d6a4
  Benjamin Lefaudeux authored Feb 12, 2021
```
This reverts commit 8be9d930.
```
  b666d6a4
- [fix] oss dict load (#383) · 8be9d930
  Benjamin Lefaudeux authored Feb 12, 2021
```
* many thanks Weiyi Zheng
```
  8be9d930
- [feature-fix-refactor][ShardedDDP] Make it possible to change trainability graph on the fly (#369) · 13445c55
  Benjamin Lefaudeux authored Feb 11, 2021
```
* Better unit testing
* Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time
* Enabling accumulation tests
```
  13445c55
11 Feb, 2021 2 commits
- [minor] ShardGradScaler - only wait for the last handle (#382) · 1a636557
  Benjamin Lefaudeux authored Feb 11, 2021
```
* super minor, opportunistic micro optim
```
  1a636557
- [chore] v0.1.6 (#377) · ce9e7e48
  Benjamin Lefaudeux authored Feb 10, 2021
```
* v0.1.6
```
  ce9e7e48
10 Feb, 2021 2 commits

Add fairscale.nn.misc.checkpoint_activations (#376) · c963a72a

Myle Ott authored Feb 10, 2021



* Add fairscale.utils.containers
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Add fairscale.nn.misc.checkpoint_activations
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

c963a72a

[fix] Workaround need for pip --no-build-isolation (#375) · e92e85ce
Leonard Lausen authored Feb 10, 2021

e92e85ce

09 Feb, 2021 1 commit
- [refactor] remove multiprocess dependency on async (#373) · b1b9e0f8
  msbaines authored Feb 08, 2021
  
  b1b9e0f8
08 Feb, 2021 2 commits
- [refactor] AsyncPipe: do not sub-class MultiProcessPipe (#370) · 08c10993
  msbaines authored Feb 08, 2021
  
  08c10993
- [refactor] OSS only use flat buffers (#371) · 77d94861
  Benjamin Lefaudeux authored Feb 07, 2021
```
* flat params all along, way simpler
* updating the docstring
```
  77d94861
05 Feb, 2021 2 commits
- [fix] repro+fix (#365) · 8778fa66
  Benjamin Lefaudeux authored Feb 05, 2021
```
fix a broken earlier commit, only worked for the first step
```
  8778fa66
- [perf] ShardedDDP - small memory use reduction - minor speedup (#366) · 4dc605c9
  Benjamin Lefaudeux authored Feb 04, 2021
```
* minor

* minor
```
  4dc605c9
04 Feb, 2021 6 commits
- [refactor] multiprocess_pipe: remove pipelined_backward (#362) · 42e44149
  msbaines authored Feb 04, 2021
  
  42e44149
- [perf][OSS] Clip grad norm : minor obvious speedup (#363) · 7fdd7ecf
  Benjamin Lefaudeux authored Feb 04, 2021
```
cache this iterator, easy speed up
```
  7fdd7ecf
- [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361) · 5c3ff9bd
  Benjamin Lefaudeux authored Feb 04, 2021
```
* Adding a proper ddp parity / AMP unit test, overdue
* catch non-AMP pytorch
```
  5c3ff9bd
- [refactor] multiprocess_pipe: focus on LazyModule usage (#360) · e3a20fef
  msbaines authored Feb 03, 2021
  
  e3a20fef
- [refactor] multiprocess_pipe: remove retain_graph __init__ param (#358) · d624b81a
  msbaines authored Feb 03, 2021
```
It is not currently being used so we can simplify the interface
by removing it.
```
  d624b81a
- [refactor] multiprocess_pipe: cleanup __init__ (#357) · 39675773
  msbaines authored Feb 03, 2021
  
  39675773
03 Feb, 2021 7 commits

[feat][minor] OSS Benchmark - regression test + background testing new optims (#352) · de713d1e
Benjamin Lefaudeux authored Feb 03, 2021
```
* restoring the regression test, adding a test of the for_each optims
* fix the regression test on circleci
* removing unused flags
```
de713d1e
[chore] disheartening switch off of a OSS cpu test (#356) · 011c0c41
Benjamin Lefaudeux authored Feb 03, 2021
```
* precise skip, only if agent has only cpu
```
011c0c41
[chore] v0.1.5 (#355) · 4401ced9
Benjamin Lefaudeux authored Feb 03, 2021

4401ced9
[refactor] pipe: simplify balance and module checks (#346) · f21b5ffc
msbaines authored Feb 03, 2021

f21b5ffc

[refactor] Refactor and enable multiprocess nn.Pipe benchmarks. (#319) · cd186441

anj-s authored Feb 03, 2021



* mp cleanup

* round of multiprocess refactoring

* test golden run

* print cuda stats

* fix lint errors

* enable multiprocess pipe benchmarks

* set world size to be available gpus

* more changes

* use synthetic loaders for intermediate pipeline stages

* merged master

* fix for the devices property

* dataloader fix

* modify rank check

* print wps stats

* enable verification

* fix logging

* fix flag name

* fix flag name

* check for rank

* fix indent

* pass args

* pass args

* modify golden data

* remove unused print messsage

* fix lint errors

* add comments

* fix benchmarks
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

cd186441

[feat] Add AdaScaleWrapper (#347) · a2408eb8

Min Xu authored Feb 03, 2021

* [feat] Add AdaScaleWrapper

- This enables a different API for wrapping an optimizer with AdaScale.
- This also enables AdaScale to be wrapped by OSS.
- However, OSS wrapping AdaScale results in different optimization,
  which future research will be needed to study its effects.

testing: add unit tests.

* addressed comment: typo

a2408eb8

[fix] ShardedDDP - properly handle post device change (#353) · a265586b
Benjamin Lefaudeux authored Feb 02, 2021
```
* adding the .to(device) support + unit testing
* doc update
```
a265586b

02 Feb, 2021 2 commits

[feat][OSS] elastic and pytorch compatible checkpoints (#310) · 9e8929e6

Benjamin Lefaudeux authored Feb 02, 2021

* adding a test to prove the inter operability with upstream pytorch
* updating the changelog
* eager state pruning
* pytorch 1.5 compat

9e8929e6

[fix] ShardedDDP - cpu testfix - remove Gloo/CPU (#350) · c2dd6c34

Benjamin Lefaudeux authored Feb 01, 2021

* no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale

c2dd6c34

01 Feb, 2021 1 commit
- [chore] Fix lint errors that broke master (#348) · dc05dd80
  anj-s authored Feb 01, 2021
```
authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
```
  dc05dd80