Commits · 15512d9e071c861f6cd9129eaa678d9d93ec371e · OpenDAS / fairscale

23 Feb, 2021 1 commit

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e

19 Feb, 2021 1 commit

[feature] Unit test with and without buckets for all ShardedDDP unit tests (#400) · 175fdeb0

Benjamin Lefaudeux authored Feb 19, 2021

* test with and without buckets for all the shardedDDP unit tests
* parametrize all the things
* refactoring, adding even more  combinations at times
* handle hosts not having cuda

175fdeb0

18 Feb, 2021 2 commits
- [feat][ShardedDDP] Support multiple groups (#394) · 205af8c2
  Benjamin Lefaudeux authored Feb 18, 2021
```
* Adding multiple groups support to ShardedDDP + unit test
* adding gloo to the backends tested for multiple groups
```
  205af8c2
- [fix][minor] ShardedDDP train/eval modes (#393) · ef7146d5
  Benjamin Lefaudeux authored Feb 18, 2021
```
* [fix] ShardedDDP train/eval modes
* Update CHANGELOG.md
```
  ef7146d5
17 Feb, 2021 1 commit
- [feat][ShardedDDP] manual reduce option (#389) · 47042917
  Benjamin Lefaudeux authored Feb 16, 2021
```
* initial implementation, with unit test and assert
* added changelog and better debug string
```
  47042917
12 Feb, 2021 1 commit

[feature-fix-refactor][ShardedDDP] Make it possible to change trainability graph on the fly (#369) · 13445c55

Benjamin Lefaudeux authored Feb 11, 2021

* Better unit testing
* Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time
* Enabling accumulation tests

13445c55

10 Feb, 2021 1 commit

Add fairscale.nn.misc.checkpoint_activations (#376) · c963a72a

Myle Ott authored Feb 10, 2021



* Add fairscale.utils.containers
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Add fairscale.nn.misc.checkpoint_activations
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

c963a72a

09 Feb, 2021 1 commit
- [refactor] remove multiprocess dependency on async (#373) · b1b9e0f8
  msbaines authored Feb 08, 2021
  
  b1b9e0f8
04 Feb, 2021 4 commits
- [refactor] multiprocess_pipe: remove pipelined_backward (#362) · 42e44149
  msbaines authored Feb 04, 2021
  
  42e44149
- [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361) · 5c3ff9bd
  Benjamin Lefaudeux authored Feb 04, 2021
```
* Adding a proper ddp parity / AMP unit test, overdue
* catch non-AMP pytorch
```
  5c3ff9bd
- [refactor] multiprocess_pipe: focus on LazyModule usage (#360) · e3a20fef
  msbaines authored Feb 03, 2021
  
  e3a20fef
- [refactor] multiprocess_pipe: cleanup __init__ (#357) · 39675773
  msbaines authored Feb 03, 2021
  
  39675773
03 Feb, 2021 2 commits
- [refactor] pipe: simplify balance and module checks (#346) · f21b5ffc
  msbaines authored Feb 03, 2021
  
  f21b5ffc
- [fix] ShardedDDP - properly handle post device change (#353) · a265586b
  Benjamin Lefaudeux authored Feb 02, 2021
```
* adding the .to(device) support + unit testing
* doc update
```
  a265586b
02 Feb, 2021 1 commit

[fix] ShardedDDP - cpu testfix - remove Gloo/CPU (#350) · c2dd6c34

Benjamin Lefaudeux authored Feb 01, 2021

* no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale

c2dd6c34

30 Jan, 2021 1 commit
- [refactor] pipe: move async-specific code out of MultiProcessPipe (#344) · a8dd9254
  msbaines authored Jan 30, 2021
  
  a8dd9254
29 Jan, 2021 1 commit
- [refactor] make AsyncPipe its own class (#341) · eaee5976
  msbaines authored Jan 29, 2021
  
  eaee5976
27 Jan, 2021 1 commit
- [refactor] pipe: separate out Single and MultiProcess pipe (#326) · cae9b638
  msbaines authored Jan 26, 2021
  
  cae9b638
23 Jan, 2021 1 commit

[feat] Add AMPnet implementation in experimental dir (#304) · 14491030

Siddharth Goyal authored Jan 22, 2021

* Add AMPnet implementation (clean version)

* Move ampnet to experimental

* Move stuff around pipeline

* Address review comments and fix pre-commit errors

* Refactor and modify delegate functionality

* Modify header in pipe.py

14491030

21 Jan, 2021 3 commits
- [fix] Lint flattenparams (#320) · bd5d0496
  Benjamin Lefaudeux authored Jan 21, 2021
```
* working around broken mypy
```
  bd5d0496
- [fix] lint/typing in FlattenParamsWrapper (#318) · a6ed6da8
  Myle Ott authored Jan 21, 2021
  
  a6ed6da8
- Add FlattenParamsWrapper (#317) · 35fdf537
  Myle Ott authored Jan 21, 2021
  
  35fdf537
15 Jan, 2021 1 commit
- [feat][ShardedDDP] Support the original module's attributes (#309) · 3e2547c3
  Benjamin Lefaudeux authored Jan 15, 2021
```
* minor, but ease of life, one less papercut
```
  3e2547c3
11 Jan, 2021 1 commit

[chore][ci] restore 1.5 & 1.6 tests and compatibility (#306) · 2d954203

Benjamin Lefaudeux authored Jan 11, 2021

* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing
* fixing oss backcompat, trying to fix rpc in old pytorch also
* fixing the file based init in torch 1.5

2d954203

05 Jan, 2021 1 commit

[fix] Flaky tests (#283) · 79365ee6

Benjamin Lefaudeux authored Jan 04, 2021

* adding the pytest timeout plugin to properly root out hanging tests
* removing redundant code, slightly more reasonable timeout, works on single cuda
* finding the root bug for some of the cpu hangs, rpc init
* propagating all the rpc init test changes to the pipe and model parallel tests

79365ee6

02 Jan, 2021 1 commit
- [fix] Typo in ShardedDDP unit test (#282) · 84a3bdbe
  Benjamin Lefaudeux authored Jan 01, 2021
```
* fix typo, backend for CPU test
```
  84a3bdbe
30 Dec, 2020 1 commit
- [feat] Add Torch Sync Batchnorm handle in sharded DDP (#265) · 1c8d219d
  Sean Naren authored Dec 30, 2020
```
* Add function to add handle for sync BN
* Add test to ensure batch norm handles have been added
```
  1c8d219d
29 Dec, 2020 1 commit
- [hotfix] Catching properly a given test failing if not enough gpus (#274) · 7abaa2be
  Benjamin Lefaudeux authored Dec 28, 2020
```
* catching properly a given test failing if not enough gpus
```
  7abaa2be
28 Dec, 2020 1 commit
- [chore] Move all unit tests dist init to being file based (#272) · b640cab5
  Benjamin Lefaudeux authored Dec 28, 2020
```
* file based dist init
* nicer handling of broken world sizes vs. number of available GPUs, do not break but warn out
```
  b640cab5
19 Dec, 2020 1 commit

[OSS] Getting rid of the "should bucket" hash table, just use a list + non... · ca74ee22

Benjamin Lefaudeux authored Dec 19, 2020

[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259)

* Getting rid of the "should bucket" hash table, just use a list
Properly handle all params, with or without requires_grad

* make sure that this case is unit tested

ca74ee22

10 Dec, 2020 1 commit

[fix] Check ShardedDDP / DDP parity + bugfix (#242) · 138b2033

Benjamin Lefaudeux authored Dec 09, 2020

* unit test checking ddp and sharded_ddp equivalence, reproducing the issue that Sean spotted
* fixing the issue, not counting requests in flight properly
* adding a multiple optimizers case

138b2033

04 Dec, 2020 1 commit

[fix] Fix iGPT buckets with ShardedDDP (#223) · 6d223777

Benjamin Lefaudeux authored Dec 03, 2020

* proper unit testing, but no other solution than disabling bucketing for now, couple of options tested do not work

6d223777

01 Dec, 2020 2 commits
- [chore] Refactor unit testing, shared utils (#218) · e83da060
  Benjamin Lefaudeux authored Dec 01, 2020
  
  e83da060
- [fix][Pipe] fallback for Pipe tests on internal pytorch numbering (#216) · 4d8f2e59
  Benjamin Lefaudeux authored Nov 30, 2020
```
* fallback on internal pytorch numbering
```
  4d8f2e59
21 Nov, 2020 1 commit

[feat] ShardedDataParallel with autoreduce (#157) · ad933b34

Benjamin Lefaudeux authored Nov 21, 2020

* rewrite using autograd and Variable execution queue to make the reduce automatic
* share buckets with OSS to remove duplication
* some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up

ad933b34

18 Nov, 2020 1 commit
- fix bug (#193) · f80b303c
  Tom Birch authored Nov 17, 2020
  
  f80b303c
11 Nov, 2020 2 commits
- [fix] moe: fix bug for multiple experts per-gpu case (#184) · 317c0945
  msbaines authored Nov 11, 2020
  
  317c0945
- [refactor] moe: remove G dimension (#183) · 89176e34
  msbaines authored Nov 11, 2020
  
  89176e34
10 Nov, 2020 1 commit

Single-process control via PipeRPCWrapper (#156) · 5d4f50fb

Tom Birch authored Nov 10, 2020

Adds support for:
* Reused layers (e.g. for weight sharing)
* Lazily-constructed layers
* Single-process control via PipeRPCWrapper
* PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive

Also added examples for multi-process and PipeRPCWrapper

5d4f50fb

30 Oct, 2020 1 commit
- [chore] add circleci testing of torch==1.5.1 (#172) · 4247f602
  msbaines authored Oct 29, 2020
  
  4247f602