- 20 Oct, 2020 3 commits
-
-
Benjamin Lefaudeux authored
* small refactor, code cleanup * broadcast tensor .data attribute directly
-
Min Xu authored
- fixed typing - make it run less often to reduce CI time testing: run it in a loop make sure it is run in the right frequency.
-
Min Xu authored
- close #143
-
- 18 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* fixing the readme for oss
-
- 17 Oct, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* adding a cpu option * adjust the reference loss
-
msbaines authored
-
- 16 Oct, 2020 4 commits
-
-
Min Xu authored
* [fix] fixing circleCI for AdaScale - ran black, isort, flake8, mypy * more fix
-
Aurick Qiao authored
* Add implementation of AdaScale * add adascale docs
-
msbaines authored
The expert annotation is used by clip_grads and DDP.
-
msbaines authored
-
- 15 Oct, 2020 1 commit
-
-
msbaines authored
-
- 14 Oct, 2020 3 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* fixing the issue wrt Apex, validated with Latte, Classy would need another pass
-
msbaines authored
-
- 10 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* bugfix * adjust default non-regression loss, not all_reduced now
-
- 09 Oct, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* wrapping the model in DDP in the tutorial * typo
-
Benjamin Lefaudeux authored
More realistic benchmarks, comparing apples to apples. DDP/OSS+DDP/OSS+SDP
-
- 08 Oct, 2020 4 commits
-
-
Benjamin Lefaudeux authored
* new unit test to catch rank issues in OSS
-
msbaines authored
Currently only implemented for a single process and expert.
-
ngoyal2707 authored
authored-by:Naman Goyal <namangoyal@learnfair0755.h2.fair>
-
Min Xu authored
* Add unittest for checkpoint & DDP - this change adds test cases to reproduce the error with checkpoint & DDP - mandeep mentioned that there is also deadlock in this case, but this change doesn't cover that. - we cover cases where weight sharing is OK - however, same module multiple checkpoint or find_unused_parameters are both not OK * added norm checks
-
- 06 Oct, 2020 2 commits
-
-
Benjamin Lefaudeux authored
Same bucketing strategy for OSS and SDP: sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
-
msbaines authored
-
- 05 Oct, 2020 1 commit
-
-
msbaines authored
-
- 02 Oct, 2020 1 commit
-
-
msbaines authored
-
- 01 Oct, 2020 3 commits
-
-
msbaines authored
-
Joshua Meier authored
support optimizer state sharding for megatron
-
Benjamin Lefaudeux authored
* minor, but gives some memory back * adjust CI and regression checks to 4 gpu
-
- 29 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- adding the buffer broadcast option - minor cleanup in shardedDDP
-
- 24 Sep, 2020 3 commits
-
-
Vittorio Caggiano authored
add badges and link to readthedoc
-
Vittorio Caggiano authored
-
Benjamin Lefaudeux authored
- small benchmark refactor, only one for all backends and ddp - deterministic, enforce alignment with pytorch ddp
-
- 22 Sep, 2020 3 commits
-
-
Benjamin Lefaudeux authored
* various fixes, no more issues with `make html` and more API fields should be populated
-
Benjamin Lefaudeux authored
* Broadcasting grad-enabled tensors is forbidden in Gloo, because this is not differentiable. Workaround
-
Benjamin Lefaudeux authored
* Doc extensions to some APIs * FIx the benchmark and tutorial
-
- 17 Sep, 2020 5 commits
-
-
Tom Birch authored
Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines) * Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess * Added support for lazy construction of modules (see lazy_construction for an example) * Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv * Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess
-
Benjamin Lefaudeux authored
- rename oss_ddp to ShardedDataParallel - some refactoring - ShardedDataParallel owns the sharded optimizer, exposed if need be - some small perf bumps
-
msbaines authored
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
-