Commits · 3d02f052e565f6bbef86d54d361165ae5d2cee7b · OpenDAS / fairscale

08 Jan, 2021 2 commits
- [refactor][OSS] Adding a pytorch parity unit test (#298) · 3d02f052
  Benjamin Lefaudeux authored Jan 08, 2021
```
* adding a parity unit test
* code review, better testing, use torch defaults and check for the loss, log world size
```
  3d02f052
- [refactor][OSS] Removing ad-hoc object broadcast, use pytorch's (#297) · 3399e97c
  Benjamin Lefaudeux authored Jan 08, 2021
  
  3399e97c
16 Dec, 2020 1 commit

[feat]: AdaScale work with lr_scheduler and tests, examples (#229) · d65cd838

Min Xu authored Dec 15, 2020

* [doc]: AdaScale example and notes

* formatted notes correctly as suggested by Benjamin

* added feature and unit test to make sure lr_scheduler works

* update the example with lr_scheduler

* fixed doc with "make html"

* addressed Mike's suggestions

d65cd838

01 Dec, 2020 1 commit
- [chore] Refactor unit testing, shared utils (#218) · e83da060
  Benjamin Lefaudeux authored Dec 01, 2020
  
  e83da060
21 Nov, 2020 1 commit

[feat] ShardedDataParallel with autoreduce (#157) · ad933b34

Benjamin Lefaudeux authored Nov 21, 2020

* rewrite using autograd and Variable execution queue to make the reduce automatic
* share buckets with OSS to remove duplication
* some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up

ad933b34

18 Nov, 2020 1 commit

[feat] ShardedOptim: Distributed Grad Scaler (for torch AMP) (#182) · d85acf72

Benjamin Lefaudeux authored Nov 17, 2020

* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea
* adding stubs & explanations in the documentation

d85acf72

16 Nov, 2020 1 commit
- [feat] OSS-aware clip grads, bridge sharded states (#167) · ade312c4
  Benjamin Lefaudeux authored Nov 16, 2020
```
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
```
  ade312c4
11 Nov, 2020 1 commit
- [refactor] moe: remove G dimension (#183) · 89176e34
  msbaines authored Nov 11, 2020
  
  89176e34
10 Nov, 2020 1 commit

Single-process control via PipeRPCWrapper (#156) · 5d4f50fb

Tom Birch authored Nov 10, 2020

Adds support for:
* Reused layers (e.g. for weight sharing)
* Lazily-constructed layers
* Single-process control via PipeRPCWrapper
* PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive

Also added examples for multi-process and PipeRPCWrapper

5d4f50fb

28 Oct, 2020 1 commit
- [refactor] moe: use all_to_all_single (#168) · 2108f20e
  msbaines authored Oct 27, 2020
  
  2108f20e
23 Oct, 2020 1 commit
- [refactor] OSS - broadcasts - getting rid of the while loop (#165) · a31b08a5
  Benjamin Lefaudeux authored Oct 23, 2020
```
* small refactor, getting rid of the while loop
```
  a31b08a5
21 Oct, 2020 1 commit

[fix] fixing adascale all_reduce (#155) · 6802ad49

Min Xu authored Oct 21, 2020

- Aurick noticed this bug and I ran into it yesterday
- after the fix, our cifar training shows same gain values from
  different replics now:

```
20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3512124098087777
20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3512124098087777
20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000600 fwd 0:00:00.003678 loss 0:00:00.000086 bwd 0:00:00.314158 update 0:00:00.002132 rest 0:00:00.000399
20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000643 fwd 0:00:00.003460 loss 0:00:00.000084 bwd 0:00:00.314678 update 0:00:00.002001 rest 0:00:00.000408
20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3514997779980324
20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3514997779980324
20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000732 fwd 0:00:00.003689 loss 0:00:00.000086 bwd 0:00:00.314176 update 0:00:00.002146 rest 0:00:00.000397
20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000646 fwd 0:00:00.003542 loss 0:00:00.000089 bwd 0:00:00.314549 update 0:00:00.001956 rest 0:00:00.000392
20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.352149646693932
20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.352149646693932
```

6802ad49

20 Oct, 2020 2 commits
- [test] fine tune test for checkpoint & DDP (#148) · 66b2b514
  Min Xu authored Oct 20, 2020
```
- fixed typing
- make it run less often to reduce CI time

testing: run it in a loop make sure it is run in the right frequency.
```
  66b2b514
- [cleanup] mypy adascale (#149) · a0042113
  Min Xu authored Oct 20, 2020
```
- close #143
```
  a0042113
14 Oct, 2020 1 commit
- [feat] moe: add all_to_all support (#134) · 6d802f5a
  msbaines authored Oct 13, 2020
  
  6d802f5a
02 Oct, 2020 1 commit
- [feat] moe: initial implementation of Top2Gating (#118) · 7815f6f3
  msbaines authored Oct 01, 2020
  
  7815f6f3
17 Sep, 2020 1 commit

Multi-process pipe (#90) · 63f7796a

Tom Birch authored Sep 17, 2020

Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines)
* Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess
* Added support for lazy construction of modules (see lazy_construction for an example)
* Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv
* Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess

63f7796a

16 Sep, 2020 1 commit
- [cleanup] fix pre-commit mypy issues (#87) · 4a874a6b
  msbaines authored Sep 16, 2020
  
  4a874a6b
03 Sep, 2020 1 commit

Add grad scaler (#48) · b6a5e634

Jun Ru Anderson authored Sep 03, 2020



Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

b6a5e634

27 Aug, 2020 1 commit

[fix] optim/oss: fix state cast (#56) · fb49b515

msbaines authored Aug 27, 2020

Workaround PyTorch bug that casts state (pytorch/pytorch#43706).

Copied from https://github.com/pytorch/fairseq/blob/v0.9.0/fairseq/optim/fp16_optimizer.py#L251-L268

fb49b515

14 Aug, 2020 2 commits
- [cleanup] get 100% coverage on oss.py (#38) · 3427a039
  msbaines authored Aug 13, 2020
```
authored-by: Mandeep Singh Baines <msb@fb.com>
```
  3427a039
- [test] using PyTorch v1.6 for Lint checks (#36) · b35a3d3f
  msbaines authored Aug 13, 2020
  
  b35a3d3f
31 Jul, 2020 3 commits
- [doc] Fix copyright in stubs (#13) · e10de4b5
  msbaines authored Jul 30, 2020
  
  e10de4b5
- [feat] Model parallel (#3) · 30f5009a
  Tom Birch authored Jul 22, 2020
  
  30f5009a
- [fix] add TransformerEncoderLayer to stubs (#5) · 63b5b166
  Jun Ru Anderson authored Jul 21, 2020
```
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>
```
  63b5b166
08 Jul, 2020 1 commit
- Initial commit · 0cd65242
  Mandeep Singh Baines authored Jul 07, 2020
  
  0cd65242