- 23 Apr, 2023 1 commit
-
-
aspanday authored
* Updating BLOCK_SIZE to 1024. tests/L0/run_optimizers/test_fused_optimizer.py test passes except for bfloat16 for Adam. There seems to be a bug in this test that needs to be resolved. For now skipping test_bfloat16 for Adam in the unittest. Ran 17 other tests and ALL other tests pass! More details on the effects of these changes can be found here - https://confluence.amd.com/display/MLSE/Apex+Kernel+Optimization . This commit changes BLOCK_SIZE=1024 ONLY FOR different optimizers. L2norm kernels (part of LAMB optimizer algorithm) still maintain BLOCK_SIZE=512 otherwise Allclose fails. * Updating tests/L0/run_optimizers/test_fused_optimizer.py with @skipifRocm to skip test_bfloat16 in Adam. Co-authored-by:
aspanday <aspanday@amd.com>
-
- 25 Feb, 2021 1 commit
-
-
Jeff Daily authored
This reverts commit bdd481d1.
-
- 05 Aug, 2020 1 commit
-
-
ngimel authored
* add device guards to the optimizers * add untracked file * set deviceGuard in multi_tensor_apply * address review comments; fix lamb * indent * typo
-
- 21 May, 2020 1 commit
-
-
Jeff Daily authored
-
- 20 May, 2020 1 commit
-
-
lcskrishna authored
-
- 12 May, 2020 1 commit
-
-
rohithkrn authored
-
- 03 Jul, 2019 1 commit
-
-
Michael Carilli authored
-
- 27 May, 2019 1 commit
-
-
Michael Carilli authored
-
- 10 May, 2019 1 commit
-
-
Michael Carilli authored
-
- 26 Apr, 2019 2 commits
-
-
Michael Carilli authored
-
Michael Carilli authored
-
- 25 Apr, 2019 1 commit
-
-
Michael Carilli authored
-
- 22 Apr, 2019 1 commit
-
-
Michael Carilli authored
-
- 18 Apr, 2019 1 commit
-
-
Michael Carilli authored
-
- 11 Mar, 2019 2 commits
-
-
Simon Layton authored
-
Simon Layton authored
Fix dispatch where we have a parameter group with multiple combinations of types Optionally apply weight decay after momentum
-
- 09 Mar, 2019 1 commit
-
-
Simon Layton authored
-
- 08 Mar, 2019 5 commits
-
-
Simon Layton authored
-
Simon Layton authored
Incorrect types used in a few places
-
Simon Layton authored
Only support the 4 specific cases we care about Remove more general set of switch statements
-
Simon Layton authored
Fuse in fp16 gradient -> fp32 convert Additional option fp16 weight copy written out
-
Simon Layton authored
Initial implementation, all fp32 Tested against torch.optim.sgd
-