- 16 Dec, 2020 1 commit
-
-
Min Xu authored
* [doc]: AdaScale example and notes * formatted notes correctly as suggested by Benjamin * added feature and unit test to make sure lr_scheduler works * update the example with lr_scheduler * fixed doc with "make html" * addressed Mike's suggestions
-
- 14 Dec, 2020 1 commit
-
-
Min Xu authored
* better ddp adascale tests * make sure the single node test use the same test cases and expected gains * added unit test that covers smoothing factor - tested by re-introducing the bug and see the test fail as expected.
-
- 06 Dec, 2020 1 commit
-
-
Min Xu authored
-
- 03 Dec, 2020 1 commit
-
-
Min Xu authored
* added AdaScale to README * [adascale] added gradient accumulation - added gradient accumulation - tested with cifar full trainings with different value of accumulation and verified the full accuracy is obtained - also removed the patch optimize flag until we need it * [adascale] adding pytest - added basic and ddp tests and grad_accum - closes #195 * added changelog * added ddp grad_accum test * moved ddp and non-ddp tests into separate files * added checkpoint test * more doc * addressed Mike's comments
-
- 16 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
-
- 06 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 28 Oct, 2020 1 commit
-
-
msbaines authored
-
- 14 Oct, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* fixing the issue wrt Apex, validated with Latte, Classy would need another pass
-
msbaines authored
-
- 08 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* new unit test to catch rank issues in OSS
-
- 15 Sep, 2020 2 commits
-
-
Benjamin Lefaudeux authored
Return either the local or global state when queried, depending on a prior consolidation
-
Benjamin Lefaudeux authored
Make OSS compatible with optimizers which do not support the closure argument
-
- 09 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
-
- 08 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that.
-
- 03 Sep, 2020 2 commits
-
-
Jun Ru Anderson authored
Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
Benjamin Lefaudeux authored
* Aligning the optimizer state dict with what PyTorch expects * Adding a check on the dict keys, ensure that `state` and `param_groups` are there * after installing the specific isort, black and all, one liner to please the linter..
-
- 28 Aug, 2020 1 commit
-
-
msbaines authored
* [fix] optim/oss: work correctly with LRScheduler Sync lr before every step and before consolidate.
-
- 27 Aug, 2020 3 commits
- 22 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Implement scaling of optimizer state when using pure-fp16 training to avoid underflow. Update benchmark to use pure-fp16. Modify state_dict methods to store and load the optimizer state scale. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 21 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 20 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* move the restored param groups to the original device * adding a corresponding test
-
- 19 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Refactor tests to remove duplicated code. Fix the state_dict test to instantiate the second optimizer with the correct precision. Fix Adam.load_state_dict to make optimizer state the right type. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 18 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Allow training with optimizer state in fp16. Use an enum to select from full-precision, mixed precision, memory efficient mixed precision and pure fp16. Improve clarity of testing code Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 14 Aug, 2020 2 commits
-
-
Jun Ru Anderson authored
Add support for mixed-precision (half precision params, full precision gradients) and memory-efficient (half precision params and half precision gradients) training with Adam Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
Benjamin Lefaudeux authored
* hotfix a half-cooked optimizer state restoration, the global shared state also needs to be restored * [cleanup] get 100% coverage on oss.py (#38) authored-by:
Mandeep Singh Baines <msb@fb.com> * better unit testing, check that the .param_groups attribute is properly in sync with the loaded state Co-authored-by:
msbaines <35972327+msbaines@users.noreply.github.com>
-
- 13 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Aligning OSS state dict with `https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html#Optimizer` (#31)
-
- 08 Aug, 2020 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <m1n@fb.com>
-
- 31 Jul, 2020 2 commits
-
-
Benjamin Lefaudeux authored
-
Jun Ru Anderson authored
Add FusedAdam, update benchmark and add tests. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 08 Jul, 2020 1 commit
-
-
Mandeep Singh Baines authored
-