Commits · ca74ee2217e04e43311e608a2cf2a51a822db926 · OpenDAS / fairscale

19 Dec, 2020 1 commit

[OSS] Getting rid of the "should bucket" hash table, just use a list + non... · ca74ee22

Benjamin Lefaudeux authored Dec 19, 2020

[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259)

* Getting rid of the "should bucket" hash table, just use a list
Properly handle all params, with or without requires_grad

* make sure that this case is unit tested

ca74ee22

17 Dec, 2020 3 commits
- [fix] grad scaler optional process group (#257) · bd7e25a5
  Benjamin Lefaudeux authored Dec 17, 2020
  
  bd7e25a5
- [fix] OSS - resolve fp16 overflow in clip grad norm (#263) · 2df5ca2d
  Joshua Meier authored Dec 17, 2020
  
  2df5ca2d
- [fix] OSS - typo + small perf fix (#256) · 2d9243bf
  Benjamin Lefaudeux authored Dec 16, 2020
```
* typo, sorry about that

* small perf fix
```
  2d9243bf
16 Dec, 2020 6 commits

[perf] ShardedDDP: better handling of the callback queue, try to consume it as we go. (#254) · 351f35e1
Benjamin Lefaudeux authored Dec 16, 2020
```
* Better handling of the callback queue, try to consume it as we go.

* dumping buckets for the reduce part, always the same unused params issue
```
351f35e1

[docs] lintfixes (#255) · 19cb5938

Benjamin Lefaudeux authored Dec 16, 2020



* lintfixes

* come on black

* Update tutorial_pipe_multiprocess.py

make RANK global like the other tutorials
Co-authored-by: Vittorio Caggiano <caggiano@gmail.com>

19cb5938

[doc] Update README.md (#244) · 550f1ab7

VitaliyLi authored Dec 16, 2020



* Update README.md

* Update README.md

update capitalization
Co-authored-by: Vittorio Caggiano <caggiano@gmail.com>

550f1ab7

[feat] add CPU support to tutorials in examples + factorize tutorials (#247) · 02478eb3

jessijzhao authored Dec 15, 2020

* [feat] add CPU support to tutorials in examples

* now works on a machine without cuda
* fixes some minor typos

* [cleanup] factorize tutorials in examples

* collects duplicate code across tutorials in helpers.py

* [fix] getData in tutorials now returns iterable

02478eb3

[fix] solutions to recent pip's isolation failing to build from source (#249) · 7e5ddcd2
Stas Bekman authored Dec 15, 2020

7e5ddcd2

[feat]: AdaScale work with lr_scheduler and tests, examples (#229) · d65cd838

Min Xu authored Dec 15, 2020

* [doc]: AdaScale example and notes

* formatted notes correctly as suggested by Benjamin

* added feature and unit test to make sure lr_scheduler works

* update the example with lr_scheduler

* fixed doc with "make html"

* addressed Mike's suggestions

d65cd838

15 Dec, 2020 1 commit
- [cleanup] ShardedDDP - inline gatekeeper (#248) · 4402c410
  Benjamin Lefaudeux authored Dec 15, 2020
  
  4402c410
14 Dec, 2020 1 commit

[fix] more adascale gradient accumulation tests and smoothing factor fix (#235) · f74afebb

Min Xu authored Dec 14, 2020

* better ddp adascale tests

* make sure the single node test use the same test cases and expected gains

* added unit test that covers smoothing factor

- tested by re-introducing the bug and see the test fail as expected.

f74afebb

10 Dec, 2020 2 commits

[doc] updating the pipe balance doc a bit (#243) · 2eef71b9

Min Xu authored Dec 10, 2020

* [doc] updating the pipe balance doc a bit

- Also added a warning to pipeline.py when the partition output is not
supported.

* addressed Mandeep's comment

2eef71b9

[fix] Check ShardedDDP / DDP parity + bugfix (#242) · 138b2033

Benjamin Lefaudeux authored Dec 09, 2020

* unit test checking ddp and sharded_ddp equivalence, reproducing the issue that Sean spotted
* fixing the issue, not counting requests in flight properly
* adding a multiple optimizers case

138b2033

09 Dec, 2020 1 commit
- [fix] Renaming large logo file - free of spaces (#240) · 6afbe677
  Benjamin Lefaudeux authored Dec 09, 2020
  
  6afbe677
07 Dec, 2020 1 commit
- [fix] ShardedGradScaler - remove the strict optimizer type requirement (#237) · c6f40418
  Benjamin Lefaudeux authored Dec 07, 2020
```
* removing strict typing requirement, broken by ClassyVision
```
  c6f40418
06 Dec, 2020 1 commit
- [fix] skipping NCCL tests on 2-GPU systems (#233) · bb468670
  Min Xu authored Dec 05, 2020
  
  bb468670
05 Dec, 2020 1 commit
- [doc] hotfixes, old documentation (#232) · 92210136
  Benjamin Lefaudeux authored Dec 04, 2020
```
Thanks Jessica for the heads up !
```
  92210136
04 Dec, 2020 2 commits

Logo (#227) · 47e57935

Vittorio Caggiano authored Dec 04, 2020



* add logo

* Update README.md
Co-authored-by: Vittorio Caggiano <caggiano@fb.com>

47e57935

[fix] Fix iGPT buckets with ShardedDDP (#223) · 6d223777

Benjamin Lefaudeux authored Dec 03, 2020

* proper unit testing, but no other solution than disabling bucketing for now, couple of options tested do not work

6d223777

03 Dec, 2020 1 commit

[feat] AdaScale: Gradient Accumulation and Add PyTest unit tests (#202) · ce5860ea

Min Xu authored Dec 03, 2020

* added AdaScale to README

* [adascale] added gradient accumulation

- added gradient accumulation
- tested with cifar full trainings with different value of accumulation
and verified the full accuracy is obtained
- also removed the patch optimize flag until we need it

* [adascale] adding pytest

- added basic and ddp tests and grad_accum
- closes #195

* added changelog

* added ddp grad_accum test

* moved ddp and non-ddp tests into separate files

* added checkpoint test

* more doc

* addressed Mike's comments

ce5860ea

02 Dec, 2020 1 commit
- [fix] make sure pip package includes header files (#221) · 867cc2df
  msbaines authored Dec 01, 2020
```
Fixes #190
```
  867cc2df
01 Dec, 2020 4 commits
- [docs] Minor refactor, trying to improve a little bit the html (#220) · 8b5b9540
  Benjamin Lefaudeux authored Dec 01, 2020
  
  8b5b9540
- [chore] Refactor unit testing, shared utils (#218) · e83da060
  Benjamin Lefaudeux authored Dec 01, 2020
  
  e83da060
- [chore] create v0.1.0 (#219) · 1db8bbda
  msbaines authored Dec 01, 2020
  
  1db8bbda
- [fix][Pipe] fallback for Pipe tests on internal pytorch numbering (#216) · 4d8f2e59
  Benjamin Lefaudeux authored Nov 30, 2020
```
* fallback on internal pytorch numbering
```
  4d8f2e59
30 Nov, 2020 1 commit
- [fix] OSS ad-hoc perf regression fix, more inconsistent than expected (#214) · 835ecb0c
  Benjamin Lefaudeux authored Nov 30, 2020
  
  835ecb0c
27 Nov, 2020 1 commit
- [doc] Fixing relative html links (#212) · d09f5aa2
  Benjamin Lefaudeux authored Nov 26, 2020
```
Fixing the relative positions of the html docs.
```
  d09f5aa2
26 Nov, 2020 1 commit
- [fix] Adding a GradScaler import guard for amp with pytorch 1.5 (#210) · 8e85ce8c
  Benjamin Lefaudeux authored Nov 25, 2020
  
  8e85ce8c
24 Nov, 2020 1 commit
- [doc] make the basic example more "compilable" (#207) · 7a062894
  Stas Bekman authored Nov 24, 2020
```
* make the basic example usable out of the box
* clarify
```
  7a062894
22 Nov, 2020 1 commit

[fix] More robust stats for regression testing (#204) · 2b121242

Benjamin Lefaudeux authored Nov 22, 2020

* testing median and MAD

* synchronize on kernels to make sure that we're measuring the actual completion time

* adjusting the circleci threshold, not that the speed has regressed but because we measure proper cuda execution time

2b121242

21 Nov, 2020 1 commit

[feat] ShardedDataParallel with autoreduce (#157) · ad933b34

Benjamin Lefaudeux authored Nov 21, 2020

* rewrite using autograd and Variable execution queue to make the reduce automatic
* share buckets with OSS to remove duplication
* some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up

ad933b34

20 Nov, 2020 1 commit
- [fix] make fairscale.utils a proper package (#200) · 35d4129f
  msbaines authored Nov 19, 2020
  
  35d4129f
19 Nov, 2020 4 commits
- [fix] make extension build robust to include path (#196) · 3b83ef51
  msbaines authored Nov 19, 2020
```
Fixes #190
```
  3b83ef51
- [test] run moe mpi tests using torch_pg (#197) · cd496b36
  msbaines authored Nov 19, 2020
  
  cd496b36
- [fix] Reverting a change which slipped in #188 (#198) · ba367d39
  Benjamin Lefaudeux authored Nov 18, 2020
```
* reverting a change which slipped in #188
```
  ba367d39
- [feat] Add CPU support for pipe.py benchmarks (#188) · a842a927
  Yuanyuan (Ana) Shen authored Nov 18, 2020
```
* Add CPU support for pipe.py benchmarks, CUDA-free
```
  a842a927
18 Nov, 2020 2 commits
- fix bug (#193) · f80b303c
  Tom Birch authored Nov 17, 2020
  
  f80b303c
- [feat] ShardedOptim: Distributed Grad Scaler (for torch AMP) (#182) · d85acf72
  Benjamin Lefaudeux authored Nov 17, 2020
```
* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea
* adding stubs & explanations in the documentation
```
  d85acf72
17 Nov, 2020 1 commit

[doc] add AdaScale API doc (#191) · 587b707d

Min Xu authored Nov 17, 2020

- removed experimental warning as we have validated it on cifar and
imagenet, transformer is looking good so far too.
- fixed API doc formatting
- make it consistent with the other code in the repo
- tested by making the doc locally and inspect the results

587b707d