Commits · 9b79cc020981a9b6e0b13570ede36a080a69c7b5 · OpenDAS / fairscale

27 Apr, 2021 1 commit
- [chore] OSS - adding the profiler labels (#629) · 9b79cc02
  Benjamin Lefaudeux authored Apr 26, 2021
  
  9b79cc02
21 Apr, 2021 1 commit
- [chore] OSS to 100% coverage (#618) · b0e6b9bd
  Benjamin Lefaudeux authored Apr 20, 2021
  
  b0e6b9bd
06 Apr, 2021 1 commit
- [fix][OSS] two small hotfixes.. repro not obvious for grad_fn (#583) · 121b9db0
  Benjamin Lefaudeux authored Apr 06, 2021
  
  121b9db0
05 Apr, 2021 1 commit
- [OSS/ShardedDDP] making APIs more private (#582) · e41452e8
  Benjamin Lefaudeux authored Apr 05, 2021
```
* making APIs more private
* linting
```
  e41452e8
04 Apr, 2021 1 commit
- [fix] OSS - enforce cuda parameters for state consolidation if NCCL backend (#573) · 88553373
  Benjamin Lefaudeux authored Apr 03, 2021
  
  88553373
19 Mar, 2021 1 commit
- [feat][refactor][OSS] Param buckets + fp16 broadcasts (#540) · e3865549
  Benjamin Lefaudeux authored Mar 19, 2021
```
* param buckets
* unifying the buckets
```
  e3865549
17 Mar, 2021 1 commit
- [feat][OSS] handle the device being changed after construction (#523) · d217278c
  Benjamin Lefaudeux authored Mar 16, 2021
  
  d217278c
15 Mar, 2021 1 commit

[feat] Make OSS state available on all ranks (#500) · 2d2412e2

Benjamin Lefaudeux authored Mar 15, 2021

* extending the current state_dict interface, make it possible to do everything in a single call, and to checkpoint on all ranks

2d2412e2

11 Mar, 2021 2 commits

[fix][OSS] Adding a hard sync stream barrier before broadcast (#512) · c9fdf506

Benjamin Lefaudeux authored Mar 11, 2021

* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene
* adding a proper unit test
* adding a unit test for https://github.com/facebookresearch/fairscale/pull/510

c9fdf506

[fix] Lightning compatibility (#510) · 1fa778d7
Benjamin Lefaudeux authored Mar 11, 2021

1fa778d7

09 Mar, 2021 2 commits
- [fix] flaky SDP tests with Gloo, checking all handles (#499) · 9c4e6d1a
  Benjamin Lefaudeux authored Mar 09, 2021
```
* seemingly fix flakyness for gloo by checking all coms handles
```
  9c4e6d1a
- [fix] oss and interleaved param groups (#483) · 02405740
  Benjamin Lefaudeux authored Mar 08, 2021
  
  02405740
05 Mar, 2021 2 commits
- [perf][minor] cache the rank lookups, small shardedddp perf fix (#474) · 131a5356
  Benjamin Lefaudeux authored Mar 05, 2021
```
* [perf][minor] cache the rank lookups, small shardedddp perf fix
* tiny improvement, code quality
```
  131a5356
- [fix][minor] Change empty shard handling for OSS, do not rely on asserts (#460) · d1fab39e
  Benjamin Lefaudeux authored Mar 04, 2021
```
* change empty shard handling for OSS, do not rely on asserts
* code review
```
  d1fab39e
23 Feb, 2021 1 commit

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e

22 Feb, 2021 1 commit
- [fix][OSS] adding an assert for empty shards + corresponding unit test (#406) · 279b8024
  Benjamin Lefaudeux authored Feb 22, 2021
```
* adding an assert + corresponding unit test
* updated changelog
* adjusting the adascale tests
```
  279b8024
19 Feb, 2021 1 commit

[feature] Unit test with and without buckets for all ShardedDDP unit tests (#400) · 175fdeb0

Benjamin Lefaudeux authored Feb 19, 2021

* test with and without buckets for all the shardedDDP unit tests
* parametrize all the things
* refactoring, adding even more  combinations at times
* handle hosts not having cuda

175fdeb0

14 Feb, 2021 1 commit

[fix] OSS dict load/save fix - better fix than 383 and unit test (#386) · 54bd62d3

Benjamin Lefaudeux authored Feb 13, 2021

* WIP, needs to be fixed !

* should be a fix, many thanks Weiyi Zheng

* slightly better unit test, sorting the states on the way out

* reproducing the issue from Weiyi in a unit test, and finally properly fixing

* fixing unit test on pytorch1.5 - original loss diff 26.404895782470703 - 26.404342651367188

54bd62d3

12 Feb, 2021 3 commits
- Revert "[fix] oss dict load (#383)" (#384) · b666d6a4
  Benjamin Lefaudeux authored Feb 12, 2021
```
This reverts commit 8be9d930.
```
  b666d6a4
- [fix] oss dict load (#383) · 8be9d930
  Benjamin Lefaudeux authored Feb 12, 2021
```
* many thanks Weiyi Zheng
```
  8be9d930
- [feature-fix-refactor][ShardedDDP] Make it possible to change trainability graph on the fly (#369) · 13445c55
  Benjamin Lefaudeux authored Feb 11, 2021
```
* Better unit testing
* Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time
* Enabling accumulation tests
```
  13445c55
08 Feb, 2021 1 commit
- [refactor] OSS only use flat buffers (#371) · 77d94861
  Benjamin Lefaudeux authored Feb 07, 2021
```
* flat params all along, way simpler
* updating the docstring
```
  77d94861
05 Feb, 2021 1 commit
- [fix] repro+fix (#365) · 8778fa66
  Benjamin Lefaudeux authored Feb 05, 2021
```
fix a broken earlier commit, only worked for the first step
```
  8778fa66
04 Feb, 2021 1 commit
- [perf][OSS] Clip grad norm : minor obvious speedup (#363) · 7fdd7ecf
  Benjamin Lefaudeux authored Feb 04, 2021
```
cache this iterator, easy speed up
```
  7fdd7ecf
02 Feb, 2021 1 commit

[feat][OSS] elastic and pytorch compatible checkpoints (#310) · 9e8929e6

Benjamin Lefaudeux authored Feb 02, 2021

* adding a test to prove the inter operability with upstream pytorch
* updating the changelog
* eager state pruning
* pytorch 1.5 compat

9e8929e6

27 Jan, 2021 1 commit
- [fix] OSS: removing the torch broadcast util altogether, broken on 1.7.1 (#329) · 38ad8638
  Benjamin Lefaudeux authored Jan 27, 2021
```
* removing the torch util altogether, broken on 1.7.1
```
  38ad8638
26 Jan, 2021 1 commit
- [OSS] Fix for torch dist broadcast randomly failing on dummy object (#323) · eab1551a
  Benjamin Lefaudeux authored Jan 25, 2021
```
* fix for torch dist broadcast failing on dummy object
```
  eab1551a
21 Jan, 2021 1 commit
- [perf] ShardedDDP & OSS, small improvements (#321) · dd441e9d
  Benjamin Lefaudeux authored Jan 21, 2021
```
* Couple of small improvements, no logic changes
```
  dd441e9d
20 Jan, 2021 1 commit
- [fix] OSS tensor view corner case + corresponding unit tests (#315) · ce2f64f9
  Benjamin Lefaudeux authored Jan 19, 2021
  
  ce2f64f9
11 Jan, 2021 2 commits

[chore][ci] restore 1.5 & 1.6 tests and compatibility (#306) · 2d954203

Benjamin Lefaudeux authored Jan 11, 2021

* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing
* fixing oss backcompat, trying to fix rpc in old pytorch also
* fixing the file based init in torch 1.5

2d954203

[perf][OSS] tensor views for bucketing (#300) · 6219b57b

Benjamin Lefaudeux authored Jan 11, 2021

* min bucket size with model size
* resize the bucket after all the params have been squeezed in, save a tiny bit of memory
* minor, ensure that the cache is freed and improve the comments

6219b57b

08 Jan, 2021 3 commits
- [doc] Minor additions to ShardedDDP docs (#299) · b202804a
  Benjamin Lefaudeux authored Jan 08, 2021
  
  b202804a
- [refactor][OSS] Removing ad-hoc object broadcast, use pytorch's (#297) · 3399e97c
  Benjamin Lefaudeux authored Jan 08, 2021
  
  3399e97c
- [feat] Support model parallelism in OSS (#287) · 9faad392
  Joshua Meier authored Jan 08, 2021
```
* add additional unit test
* support model parallelism in oss
```
  9faad392
30 Dec, 2020 1 commit

[fix] Dead code removal for OSS (#276) · fb8d9137

Benjamin Lefaudeux authored Dec 29, 2020

* removing a dead call since ShardedDDP, small speedup
* unrelated, but filling in the changelog
* another nit

fb8d9137

22 Dec, 2020 1 commit

[OSS] Balance the trainable params only (#262) · c386e937

Benjamin Lefaudeux authored Dec 21, 2020

* fix, one liner

* adjust so that frozen trunks get spread still, even if this should have little consequences

* removing dead code, hopeful unit test fix

* now with some linting..

* adding a proper unit test case

c386e937

19 Dec, 2020 1 commit

[OSS] Getting rid of the "should bucket" hash table, just use a list + non... · ca74ee22

Benjamin Lefaudeux authored Dec 19, 2020

[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259)

* Getting rid of the "should bucket" hash table, just use a list
Properly handle all params, with or without requires_grad

* make sure that this case is unit tested

ca74ee22

17 Dec, 2020 2 commits
- [fix] OSS - resolve fp16 overflow in clip grad norm (#263) · 2df5ca2d
  Joshua Meier authored Dec 17, 2020
  
  2df5ca2d
- [fix] OSS - typo + small perf fix (#256) · 2d9243bf
  Benjamin Lefaudeux authored Dec 16, 2020
```
* typo, sorry about that

* small perf fix
```
  2d9243bf
16 Dec, 2020 1 commit
- [perf] ShardedDDP: better handling of the callback queue, try to consume it as we go. (#254) · 351f35e1
  Benjamin Lefaudeux authored Dec 16, 2020
```
* Better handling of the callback queue, try to consume it as we go.

* dumping buckets for the reduce part, always the same unused params issue
```
  351f35e1