- 06 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 05 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* making APIs more private * linting
-
- 30 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* survive the model being moved to device post-construction * make sure that a unit test would catch a regression
-
- 25 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* re-activating unit test * removing changed that slipped in
-
- 18 Mar, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* extracting the buckets in a dedicated class, fixing the resize_ bug * adding a unit test * copyright
-
Benjamin Lefaudeux authored
-
- 17 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Deactivating buckets for a single rank, not crashing but not useful
-
- 09 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* seemingly fix flakyness for gloo by checking all coms handles
-
- 05 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* [perf][minor] cache the rank lookups, small shardedddp perf fix * tiny improvement, code quality
-
- 04 Mar, 2021 1 commit
-
-
Min Xu authored
- cover them in terms of code path only - numerically, AdaScale is different on SDP/FSDP than DDP, mainly due to partial view of the gradients. - this doesn't mean it is definitely not useful but it is yet to be validated. - not going to spend too much time until we have a real use case.
-
- 25 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* bring back a fix from FSDP, may help a few existing users
-
- 23 Feb, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* POC, testing against the DDP comm hook when available * docs, adding a reference to DDP's compress hook * updating changelog, prep for v0.1.8 release
-
- 19 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* test with and without buckets for all the shardedDDP unit tests * parametrize all the things * refactoring, adding even more combinations at times * handle hosts not having cuda
-
- 18 Feb, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* Adding multiple groups support to ShardedDDP + unit test * adding gloo to the backends tested for multiple groups
-
Benjamin Lefaudeux authored
* [fix] ShardedDDP train/eval modes * Update CHANGELOG.md
-
- 17 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* initial implementation, with unit test and assert * added changelog and better debug string
-
- 12 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Better unit testing * Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time * Enabling accumulation tests
-
- 05 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* minor * minor
-
- 03 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding the .to(device) support + unit testing * doc update
-
- 29 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 21 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Couple of small improvements, no logic changes
-
- 15 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* minor, but ease of life, one less papercut
-
- 08 Jan, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* minor, not life changing but removing a dependency on runtime optim
-
- 02 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix typo, backend for CPU test
-
- 30 Dec, 2020 1 commit
-
-
Sean Naren authored
* Add function to add handle for sync BN * Add test to ensure batch norm handles have been added
-
- 19 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259) * Getting rid of the "should bucket" hash table, just use a list Properly handle all params, with or without requires_grad * make sure that this case is unit tested
-
- 16 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* Better handling of the callback queue, try to consume it as we go. * dumping buckets for the reduce part, always the same unused params issue
-
jessijzhao authored
* [feat] add CPU support to tutorials in examples * now works on a machine without cuda * fixes some minor typos * [cleanup] factorize tutorials in examples * collects duplicate code across tutorials in helpers.py * [fix] getData in tutorials now returns iterable
-
- 15 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 04 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* proper unit testing, but no other solution than disabling bucketing for now, couple of options tested do not work
-
- 21 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* rewrite using autograd and Variable execution queue to make the reduce automatic * share buckets with OSS to remove duplication * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
-
- 21 Oct, 2020 1 commit
-
-
Min Xu authored
- Aurick noticed this bug and I ran into it yesterday - after the fix, our cifar training shows same gain values from different replics now: ``` 20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3512124098087777 20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3512124098087777 20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000600 fwd 0:00:00.003678 loss 0:00:00.000086 bwd 0:00:00.314158 update 0:00:00.002132 rest 0:00:00.000399 20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000643 fwd 0:00:00.003460 loss 0:00:00.000084 bwd 0:00:00.314678 update 0:00:00.002001 rest 0:00:00.000408 20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3514997779980324 20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3514997779980324 20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000732 fwd 0:00:00.003689 loss 0:00:00.000086 bwd 0:00:00.314176 update 0:00:00.002146 rest 0:00:00.000397 20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000646 fwd 0:00:00.003542 loss 0:00:00.000089 bwd 0:00:00.314549 update 0:00:00.001956 rest 0:00:00.000392 20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.352149646693932 20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.352149646693932 ```
-
- 06 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Same bucketing strategy for OSS and SDP: sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
-
- 29 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- adding the buffer broadcast option - minor cleanup in shardedDDP
-
- 17 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- rename oss_ddp to ShardedDataParallel - some refactoring - ShardedDataParallel owns the sharded optimizer, exposed if need be - some small perf bumps
-
- 28 Aug, 2020 1 commit
-
-
Min Xu authored
- added train(mode) method to be aware of eval mode
-
- 06 Aug, 2020 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <m1n@fb.com>
-