- 12 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* Better unit testing * Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time * Enabling accumulation tests
-
- 05 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
fix a broken earlier commit, only worked for the first step
-
- 03 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* precise skip, only if agent has only cpu
-
- 02 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding a test to prove the inter operability with upstream pytorch * updating the changelog * eager state pruning * pytorch 1.5 compat
-
- 27 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 20 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-
- 08 Jan, 2021 3 commits
-
-
Benjamin Lefaudeux authored
* adding a parity unit test * code review, better testing, use torch defaults and check for the loss, log world size
-
Benjamin Lefaudeux authored
-
Joshua Meier authored
* add additional unit test * support model parallelism in oss
-
- 05 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
- 29 Dec, 2020 1 commit
-
-
Joshua Meier authored
author: Joshua Meier
-
- 22 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* fix, one liner * adjust so that frozen trunks get spread still, even if this should have little consequences * removing dead code, hopeful unit test fix * now with some linting.. * adding a proper unit test case
-
- 06 Dec, 2020 1 commit
-
-
Min Xu authored
-
- 16 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
-
- 06 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 14 Oct, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* fixing the issue wrt Apex, validated with Latte, Classy would need another pass
-
msbaines authored
-
- 08 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* new unit test to catch rank issues in OSS
-
- 15 Sep, 2020 2 commits
-
-
Benjamin Lefaudeux authored
Return either the local or global state when queried, depending on a prior consolidation
-
Benjamin Lefaudeux authored
Make OSS compatible with optimizers which do not support the closure argument
-
- 09 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
-
- 08 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that.
-
- 03 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* Aligning the optimizer state dict with what PyTorch expects * Adding a check on the dict keys, ensure that `state` and `param_groups` are there * after installing the specific isort, black and all, one liner to please the linter..
-
- 28 Aug, 2020 1 commit
-
-
msbaines authored
* [fix] optim/oss: work correctly with LRScheduler Sync lr before every step and before consolidate.
-
- 27 Aug, 2020 3 commits
- 20 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* move the restored param groups to the original device * adding a corresponding test
-
- 14 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* hotfix a half-cooked optimizer state restoration, the global shared state also needs to be restored * [cleanup] get 100% coverage on oss.py (#38) authored-by:
Mandeep Singh Baines <msb@fb.com> * better unit testing, check that the .param_groups attribute is properly in sync with the loaded state Co-authored-by:
msbaines <35972327+msbaines@users.noreply.github.com>
-
- 13 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Aligning OSS state dict with `https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html#Optimizer` (#31)
-
- 08 Aug, 2020 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <m1n@fb.com>
-
- 31 Jul, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 08 Jul, 2020 1 commit
-
-
Mandeep Singh Baines authored
-