- 10 Sep, 2020 1 commit
-
-
msbaines authored
-
- 09 Sep, 2020 7 commits
-
-
msbaines authored
-
msbaines authored
-
msbaines authored
-
Benjamin Lefaudeux authored
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
-
msbaines authored
-
msbaines authored
Needed for working correctly with readthedocs.org
-
msbaines authored
-
- 08 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that.
-
- 04 Sep, 2020 1 commit
-
-
msbaines authored
Built via: $ conda build --python 3.8 .
-
- 03 Sep, 2020 3 commits
-
-
Benjamin Lefaudeux authored
* Aligning the optimizer state dict with what PyTorch expects * Adding a check on the dict keys, ensure that `state` and `param_groups` are there * after installing the specific isort, black and all, one liner to please the linter.. * Adding some measurement of the memory consumption while training + checkpointing * mandatory lintfix commit * brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row * move reset stats call, hotfix * move the optimizer to rmsprop, more stateful and still used in CV * trying to figure out a sigsev in circleci
-
Jun Ru Anderson authored
Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
Benjamin Lefaudeux authored
* Aligning the optimizer state dict with what PyTorch expects * Adding a check on the dict keys, ensure that `state` and `param_groups` are there * after installing the specific isort, black and all, one liner to please the linter..
-
- 28 Aug, 2020 4 commits
-
-
msbaines authored
-
Jun Ru Anderson authored
* specify chunks for pipe/transformer benchmark Set chunks to be equal to len(balance) for pipe/transformer benchmark. Will update words per second and memory usage checks in next commit (must test on CircleCI to find appropriate values) * change benchmark words per second and memory usage Did six runs for words-per-second, with results: 9144.40, 9163.91, 9993.01, 9082.82, 9155.09, 9000.67 Peak allocated bytes per device (which does not change between runs) were 193206272, 645632, 562688, 92688384 for devices 0, 1, 2 and 3, respectively * increase batch size batch size was small enough that the GPU's computing power was not the bottleneck, slowing training and specifically making more chunks slower. Increasing batch size has therefore increased training speed * update benchmark numbers ran six times, with wps 36917.44, 36797.65, 37006.03, 36872.84, 37129.31, 37003.31 and peak allocated bytes 4061909504, 4050944, 10427392, 2031824896 for devices 0,1,2 and 3 respectively. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
msbaines authored
* [fix] optim/oss: work correctly with LRScheduler Sync lr before every step and before consolidate.
-
Min Xu authored
- added train(mode) method to be aware of eval mode
-
- 27 Aug, 2020 4 commits
-
-
msbaines authored
Workaround PyTorch bug that casts state (pytorch/pytorch#43706). Copied from https://github.com/pytorch/fairseq/blob/v0.9.0/fairseq/optim/fp16_optimizer.py#L251-L268
-
msbaines authored
-
msbaines authored
-
msbaines authored
* [fix] optim/oss: support optimizers with additional step kwargs Some of the optimizers in apex support additional kwargs to step such as scale.
-
- 22 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Implement scaling of optimizer state when using pure-fp16 training to avoid underflow. Update benchmark to use pure-fp16. Modify state_dict methods to store and load the optimizer state scale. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 21 Aug, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* initial commit, dummy training loop, pure pytorch but not DDP * probably slightly broken, but rough DDP benchmark run * adding the torchvision requirement for testing * brainfart * reduce the loss, do something slightly distributed * Some cleanup, distributing the training on two GPUs * some cleanup + adding a vanilla run, still not good to go * less silly defaults, gtg for a start I think * smaller batch to fit the smaller gpus used in the circleci rigs * Adding some options for the benchmark, and regression testing * [test] set torch seed for Adam tests (#49) Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict. Co-authored-by:
Jun Ru Anderson <andersonic@fb.com> * linting, I really need to automate this isort insanity Co-authored-by:
Jun Ru Anderson <33384298+andersonic@users.noreply.github.com> Co-authored-by:
Jun Ru Anderson <andersonic@fb.com>
-
Jun Ru Anderson authored
Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 20 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* move the restored param groups to the original device * adding a corresponding test
-
- 19 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Refactor tests to remove duplicated code. Fix the state_dict test to instantiate the second optimizer with the correct precision. Fix Adam.load_state_dict to make optimizer state the right type. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 18 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Allow training with optimizer state in fp16. Use an enum to select from full-precision, mixed precision, memory efficient mixed precision and pure fp16. Improve clarity of testing code Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 14 Aug, 2020 5 commits
-
-
Jun Ru Anderson authored
Add support for mixed-precision (half precision params, full precision gradients) and memory-efficient (half precision params and half precision gradients) training with Adam Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
Benjamin Lefaudeux authored
* hotfix a half-cooked optimizer state restoration, the global shared state also needs to be restored * [cleanup] get 100% coverage on oss.py (#38) authored-by:
Mandeep Singh Baines <msb@fb.com> * better unit testing, check that the .param_groups attribute is properly in sync with the loaded state Co-authored-by:
msbaines <35972327+msbaines@users.noreply.github.com>
-
msbaines authored
authored-by:Mandeep Singh Baines <msb@fb.com>
-
msbaines authored
* Set baseline coverage to 94%
-
msbaines authored
-
- 13 Aug, 2020 5 commits
-
-
msbaines authored
-
Jun Ru Anderson authored
-
msbaines authored
-
Jun Ru Anderson authored
Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
Benjamin Lefaudeux authored
Aligning OSS state dict with `https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html#Optimizer` (#31)
-
- 08 Aug, 2020 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <m1n@fb.com>
-
- 06 Aug, 2020 2 commits
-
-
Min Xu authored
Co-authored-by:Min Xu <m1n@fb.com>
-
Min Xu authored
Co-authored-by:Min Xu <m1n@fb.com>
-