Commits · 8b59267b2e3f213b62b3aaa19e6dace0d5f10a26 · OpenDAS / fairscale

18 Mar, 2021 3 commits

[feat] FSDP: add auto_wrap_bn (#531) · 8b59267b

Min Xu authored Mar 18, 2021

* [feat] FSDP: add auto_wrap_bn

- add an utility function to handle wrapping of BN

* changelog

8b59267b

[feature] FSDP: enable pytorch SyncBN (#527) · 2fc1f6d8

Min Xu authored Mar 17, 2021

* [feature] FSDP: enable pytorch SyncBN

- not fully validated yet but at least not asserting
- this enables VISSL to move forward with its next PR

* add the test file

* changelog and lint

* addressed comment

2fc1f6d8

[refactor] removing duplicated tests (#529) · 98223763
Benjamin Lefaudeux authored Mar 17, 2021

98223763

17 Mar, 2021 1 commit
- [fix][SDP] Lightning-compat: deactivating buckets for a single rank, not useful (#514) · d3bfcbf5
  Benjamin Lefaudeux authored Mar 17, 2021
```
* Deactivating buckets for a single rank, not crashing but not useful
```
  d3bfcbf5
12 Mar, 2021 1 commit

[fix] FSDP: multi-pass autograd graph and mixed precision (#513) · 82986ca0

Min Xu authored Mar 12, 2021



* FSDP: multi-pass autograd graph and mixed precision

- added BACKWARD_PRE/POST checking
- better assert_state
- fixed issue of backward hook misfiring

* fix

* cleanup

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
Co-authored-by: Myle Ott <myleott@fb.com>
Co-authored-by: Myle Ott <myleott@fb.com>

82986ca0

11 Mar, 2021 1 commit

[fix][OSS] Adding a hard sync stream barrier before broadcast (#512) · c9fdf506

Benjamin Lefaudeux authored Mar 11, 2021

* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene
* adding a proper unit test
* adding a unit test for https://github.com/facebookresearch/fairscale/pull/510

c9fdf506

09 Mar, 2021 2 commits
- [perf] Further improve performance for FSDP.no_sync (#502) · 0cbf3bab
  Myle Ott authored Mar 09, 2021
  
  0cbf3bab
- [fix] FSDP: fix MoE corner case (fixes #467) (#501) · 05ce7971
  Myle Ott authored Mar 08, 2021
  
  05ce7971
08 Mar, 2021 1 commit

[fix]: handle inputs with containers in mixed precision (#486) · 2e9a14e7

Min Xu authored Mar 08, 2021

* [fix]: handle inputs with containers

- this is an issue surfaces by vissl as well
- fix seems to be super simple
- also cleaned up two tests with respect to multiple such tests
  running back to back (they don't do that presently)

* cleanup

* fix

* lint

2e9a14e7

06 Mar, 2021 1 commit
- [perf] FSDP: speed up no_sync and test communication volume (#470) · 1204c7cf
  Myle Ott authored Mar 06, 2021
  
  1204c7cf
05 Mar, 2021 1 commit
- [perf][minor] cache the rank lookups, small shardedddp perf fix (#474) · 131a5356
  Benjamin Lefaudeux authored Mar 05, 2021
```
* [perf][minor] cache the rank lookups, small shardedddp perf fix
* tiny improvement, code quality
```
  131a5356
04 Mar, 2021 1 commit
- [feat] add buffer_dtype kwarg for more control of batchnorm (#458) · b36e01d5
  Sam Shleifer authored Mar 04, 2021
  
  b36e01d5
03 Mar, 2021 1 commit
- [docs] minor doc update (#459) · 428110b8
  Min Xu authored Mar 02, 2021
  
  428110b8
02 Mar, 2021 1 commit
- [fix] Make state_dict all-gather FP32 params (#451) · d2924670
  Myle Ott authored Mar 02, 2021
  
  d2924670
01 Mar, 2021 2 commits

[chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7

Min Xu authored Mar 01, 2021

* [chores]: CI py39 on GPU and more efficiency

* add test list files

* fix

* add test list files

* split benchmark run into 2 runs

* fix 1.8 version and balance benchmarks

* fix

* fix

* fix

* fix

* recording tests

* py39 install fix

* test again

* move tests

* reorg tests

* skip tests for torch 1.8 due to an upstream bug

* removed __init__.py from tests since it confuses pytest

* Revert "removed __init__.py from tests since it confuses pytest"

This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.

* don't include __init__ in file list

* notes on __init__.py and added missing ones

* fixed mypy in a test file

* balance test runtime

* better pip install

* balance more

* pip fix

* balance

* balance more, all test should finish within 20m now

* minor license update

* trying cu102

* more doc and addressed Ben's comments

* debugging

* debugging

* better capture the errors

* debugging

* fix pyenv command

* add universe repo

* update to cuda 11 for 171

* add a test file, improved the checking script

5eb6b8c7

[test] FSDP: add the failing test for #421 (#453) · 5ecac15a

Min Xu authored Mar 01, 2021



* [test] FSDP: add the failing test for #421

* skip on 1.5

* better skipping

* Update tests/nn/data_parallel/test_fsdp_grad_scaler.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

5ecac15a

27 Feb, 2021 1 commit

[fix] FSDP: fix the corner case of all params are in the children (#441) · b75a5e26

Min Xu authored Feb 26, 2021

* [fix] FSDP corner case of all params at in the children

* lint

* fix

* tradeoff

* fix doc build

* review comments

b75a5e26

26 Feb, 2021 3 commits
- [fix] fix FSDP state_dict/load_state_dict for nested wrapped instances (#440) · b6dc98cf
  Myle Ott authored Feb 26, 2021
  
  b6dc98cf
- [fix] Fix nested FlattenParamsWrapper state_dict/load_state_dict (#434) · 506d6209
  Myle Ott authored Feb 26, 2021
  
  506d6209
- [feat]: add summon_full_params context mgr (#433) · 77f92b38
  Min Xu authored Feb 25, 2021
```
* [feat]: add summon_full_params context mgr

* fix

* fix

* addressed comments

* fixed the state_dict copy

* lint
```
  77f92b38
25 Feb, 2021 1 commit
- [ShardedDDP][Minor] Backport a bucket flush fix from FSDP, may help a few existing users (#435) · 7ee228bf
  Benjamin Lefaudeux authored Feb 25, 2021
```
* bring back a fix from FSDP, may help a few existing users
```
  7ee228bf
24 Feb, 2021 1 commit
- [fix]: Fix non-float buffers in FSDP (#427) · 9e0df348
  Myle Ott authored Feb 23, 2021
  
  9e0df348
23 Feb, 2021 2 commits

[perf][ShardedDDP] fp16 gradient reduce (#411) · d52d2186

Benjamin Lefaudeux authored Feb 22, 2021

* POC, testing against the DDP comm hook when available
* docs, adding a reference to DDP's compress hook
* updating changelog, prep for v0.1.8 release

d52d2186

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e

19 Feb, 2021 1 commit

[feature] Unit test with and without buckets for all ShardedDDP unit tests (#400) · 175fdeb0

Benjamin Lefaudeux authored Feb 19, 2021

* test with and without buckets for all the shardedDDP unit tests
* parametrize all the things
* refactoring, adding even more  combinations at times
* handle hosts not having cuda

175fdeb0

18 Feb, 2021 2 commits
- [feat][ShardedDDP] Support multiple groups (#394) · 205af8c2
  Benjamin Lefaudeux authored Feb 18, 2021
```
* Adding multiple groups support to ShardedDDP + unit test
* adding gloo to the backends tested for multiple groups
```
  205af8c2
- [fix][minor] ShardedDDP train/eval modes (#393) · ef7146d5
  Benjamin Lefaudeux authored Feb 18, 2021
```
* [fix] ShardedDDP train/eval modes
* Update CHANGELOG.md
```
  ef7146d5
17 Feb, 2021 1 commit
- [feat][ShardedDDP] manual reduce option (#389) · 47042917
  Benjamin Lefaudeux authored Feb 16, 2021
```
* initial implementation, with unit test and assert
* added changelog and better debug string
```
  47042917
12 Feb, 2021 1 commit

[feature-fix-refactor][ShardedDDP] Make it possible to change trainability graph on the fly (#369) · 13445c55

Benjamin Lefaudeux authored Feb 11, 2021

* Better unit testing
* Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time
* Enabling accumulation tests

13445c55

04 Feb, 2021 1 commit
- [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361) · 5c3ff9bd
  Benjamin Lefaudeux authored Feb 04, 2021
```
* Adding a proper ddp parity / AMP unit test, overdue
* catch non-AMP pytorch
```
  5c3ff9bd
03 Feb, 2021 1 commit
- [fix] ShardedDDP - properly handle post device change (#353) · a265586b
  Benjamin Lefaudeux authored Feb 02, 2021
```
* adding the .to(device) support + unit testing
* doc update
```
  a265586b
02 Feb, 2021 1 commit

[fix] ShardedDDP - cpu testfix - remove Gloo/CPU (#350) · c2dd6c34

Benjamin Lefaudeux authored Feb 01, 2021

* no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale

c2dd6c34

15 Jan, 2021 1 commit
- [feat][ShardedDDP] Support the original module's attributes (#309) · 3e2547c3
  Benjamin Lefaudeux authored Jan 15, 2021
```
* minor, but ease of life, one less papercut
```
  3e2547c3
05 Jan, 2021 1 commit

[fix] Flaky tests (#283) · 79365ee6

Benjamin Lefaudeux authored Jan 04, 2021

* adding the pytest timeout plugin to properly root out hanging tests
* removing redundant code, slightly more reasonable timeout, works on single cuda
* finding the root bug for some of the cpu hangs, rpc init
* propagating all the rpc init test changes to the pipe and model parallel tests

79365ee6

02 Jan, 2021 1 commit
- [fix] Typo in ShardedDDP unit test (#282) · 84a3bdbe
  Benjamin Lefaudeux authored Jan 01, 2021
```
* fix typo, backend for CPU test
```
  84a3bdbe
30 Dec, 2020 1 commit
- [feat] Add Torch Sync Batchnorm handle in sharded DDP (#265) · 1c8d219d
  Sean Naren authored Dec 30, 2020
```
* Add function to add handle for sync BN
* Add test to ensure batch norm handles have been added
```
  1c8d219d
19 Dec, 2020 1 commit

[OSS] Getting rid of the "should bucket" hash table, just use a list + non... · ca74ee22

Benjamin Lefaudeux authored Dec 19, 2020

[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259)

* Getting rid of the "should bucket" hash table, just use a list
Properly handle all params, with or without requires_grad

* make sure that this case is unit tested

ca74ee22

10 Dec, 2020 1 commit

[fix] Check ShardedDDP / DDP parity + bugfix (#242) · 138b2033

Benjamin Lefaudeux authored Dec 09, 2020

* unit test checking ddp and sharded_ddp equivalence, reproducing the issue that Sean spotted
* fixing the issue, not counting requests in flight properly
* adding a multiple optimizers case

138b2033

04 Dec, 2020 1 commit

[fix] Fix iGPT buckets with ShardedDDP (#223) · 6d223777

Benjamin Lefaudeux authored Dec 03, 2020

* proper unit testing, but no other solution than disabling bucketing for now, couple of options tested do not work

6d223777

21 Nov, 2020 1 commit

[feat] ShardedDataParallel with autoreduce (#157) · ad933b34

Benjamin Lefaudeux authored Nov 21, 2020

* rewrite using autograd and Variable execution queue to make the reduce automatic
* share buckets with OSS to remove duplication
* some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up

ad933b34