- 04 Feb, 2021 4 commits
-
-
msbaines authored
-
Benjamin Lefaudeux authored
* Adding a proper ddp parity / AMP unit test, overdue * catch non-AMP pytorch
-
msbaines authored
-
msbaines authored
-
- 03 Feb, 2021 2 commits
-
-
msbaines authored
-
Benjamin Lefaudeux authored
* adding the .to(device) support + unit testing * doc update
-
- 02 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale
-
- 30 Jan, 2021 1 commit
-
-
msbaines authored
-
- 29 Jan, 2021 1 commit
-
-
msbaines authored
-
- 27 Jan, 2021 1 commit
-
-
msbaines authored
-
- 23 Jan, 2021 1 commit
-
-
Siddharth Goyal authored
* Add AMPnet implementation (clean version) * Move ampnet to experimental * Move stuff around pipeline * Address review comments and fix pre-commit errors * Refactor and modify delegate functionality * Modify header in pipe.py
-
- 21 Jan, 2021 3 commits
-
-
Benjamin Lefaudeux authored
* working around broken mypy
-
Myle Ott authored
-
Myle Ott authored
-
- 15 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* minor, but ease of life, one less papercut
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-
- 05 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
- 02 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix typo, backend for CPU test
-
- 30 Dec, 2020 1 commit
-
-
Sean Naren authored
* Add function to add handle for sync BN * Add test to ensure batch norm handles have been added
-
- 29 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* catching properly a given test failing if not enough gpus
-
- 28 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* file based dist init * nicer handling of broken world sizes vs. number of available GPUs, do not break but warn out
-
- 19 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259) * Getting rid of the "should bucket" hash table, just use a list Properly handle all params, with or without requires_grad * make sure that this case is unit tested
-
- 10 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* unit test checking ddp and sharded_ddp equivalence, reproducing the issue that Sean spotted * fixing the issue, not counting requests in flight properly * adding a multiple optimizers case
-
- 04 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* proper unit testing, but no other solution than disabling bucketing for now, couple of options tested do not work
-
- 01 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* fallback on internal pytorch numbering
-
- 21 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* rewrite using autograd and Variable execution queue to make the reduce automatic * share buckets with OSS to remove duplication * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
-
- 18 Nov, 2020 1 commit
-
-
Tom Birch authored
-
- 11 Nov, 2020 2 commits
- 10 Nov, 2020 1 commit
-
-
Tom Birch authored
Adds support for: * Reused layers (e.g. for weight sharing) * Lazily-constructed layers * Single-process control via PipeRPCWrapper * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive Also added examples for multi-process and PipeRPCWrapper
-
- 30 Oct, 2020 1 commit
-
-
msbaines authored
-
- 29 Oct, 2020 1 commit
-
-
msbaines authored
-
- 23 Oct, 2020 1 commit
-
-
msbaines authored
-
- 21 Oct, 2020 1 commit
-
-
msbaines authored
-
- 20 Oct, 2020 1 commit
-
-
Min Xu authored
- fixed typing - make it run less often to reduce CI time testing: run it in a loop make sure it is run in the right frequency.
-
- 17 Oct, 2020 1 commit
-
-
msbaines authored
-
- 16 Oct, 2020 2 commits
- 14 Oct, 2020 1 commit
-
-
msbaines authored
-