Commits · 06053e19df67e858be4d39b9ed742978f754fb25 · OpenDAS / apex

23 Apr, 2023 1 commit

Updating BLOCK_SIZE to 1024 in all optimizers. (#103) · 06053e19

aspanday authored Jan 24, 2023

* Updating BLOCK_SIZE to 1024.
tests/L0/run_optimizers/test_fused_optimizer.py test passes except for bfloat16 for Adam. There seems to be a bug in this test that needs to be resolved.
For now skipping test_bfloat16 for Adam in the unittest.
Ran 17 other tests and ALL other tests pass!
More details on the effects of these changes can be found here -  https://confluence.amd.com/display/MLSE/Apex+Kernel+Optimization

.
This commit changes BLOCK_SIZE=1024 ONLY FOR different optimizers.
L2norm kernels (part of LAMB optimizer algorithm) still maintain BLOCK_SIZE=512 otherwise Allclose fails.

* Updating tests/L0/run_optimizers/test_fused_optimizer.py with @skipifRocm to skip test_bfloat16 in Adam.
Co-authored-by: aspanday <aspanday@amd.com>

06053e19

25 Feb, 2021 1 commit
- Revert "pass all TensorListMetadata as pointer to pinned host memory (#13)" · fbb8cd93
  Jeff Daily authored Feb 25, 2021
```
This reverts commit bdd481d1.
```
  fbb8cd93
05 Aug, 2020 1 commit

set device guard for multi tensor optimizer implementations (#927) · 274cc063

ngimel authored Aug 05, 2020

* add device guards to the optimizers

* add untracked file

* set deviceGuard in multi_tensor_apply

* address review comments; fix lamb

* indent

* typo

274cc063

21 May, 2020 1 commit
- pass all TensorListMetadata as pointer to pinned host memory (#13) · bdd481d1
  Jeff Daily authored May 21, 2020
  
  bdd481d1
20 May, 2020 1 commit
- bug fixes in sgd kernel in bfp16 bringup · 98a64039
  lcskrishna authored May 20, 2020
  
  98a64039
12 May, 2020 1 commit
- enable multi tensor extension for bfloat16 · 69251362
  rohithkrn authored May 11, 2020
  
  69251362
03 Jul, 2019 1 commit
- Changing AT_CHECK to TORCH_CHECK · adee29f6
  Michael Carilli authored Jul 03, 2019
  
  adee29f6
27 May, 2019 1 commit
- FusedSGD tests passing for all opt_levels · 848c777d
  Michael Carilli authored May 27, 2019
  
  848c777d
10 May, 2019 1 commit
- materialize_master_weights for FusedSGD · c763f0fe
  Michael Carilli authored May 09, 2019
  
  c763f0fe
26 Apr, 2019 2 commits
- Tested on 1x8x1 · 070c7e96
  Michael Carilli authored Apr 26, 2019
  
  070c7e96
- Fixed bounds checking · 3b32c401
  Michael Carilli authored Apr 26, 2019
  
  3b32c401
25 Apr, 2019 1 commit
- let's see · 75139ca3
  Michael Carilli authored Apr 25, 2019
  
  75139ca3
22 Apr, 2019 1 commit
- Updating TensorList->TensorListMetadata · 16a3bdf3
  Michael Carilli authored Apr 22, 2019
  
  16a3bdf3
18 Apr, 2019 1 commit
- cleanup · 651150cb
  Michael Carilli authored Apr 18, 2019
  
  651150cb
11 Mar, 2019 2 commits
- Fix momentum initialization with weight decay · 724672d7
  Simon Layton authored Mar 11, 2019
  
  724672d7
- Fix dispatch, add wd after momentum option · b265b0b5
  Simon Layton authored Mar 11, 2019
```
Fix dispatch where we have a parameter group with multiple
combinations of types
Optionally apply weight decay after momentum
```
  b265b0b5
09 Mar, 2019 1 commit
- Fix momentum in non-nesterov case · ac74f345
  Simon Layton authored Mar 08, 2019
  
  ac74f345
08 Mar, 2019 5 commits
- Simplify noop exit condition · 6d6f0bc2
  Simon Layton authored Mar 08, 2019
  
  6d6f0bc2
- Handle fp16 weights case without forcing fp16 math · a2799893
  Simon Layton authored Mar 08, 2019
```
Incorrect types used in a few places
```
  a2799893
- Simplify C++-side logic · 75c8a97a
  Simon Layton authored Mar 08, 2019
```
Only support the 4 specific cases we care about
Remove more general set of switch statements
```
  75c8a97a
- Code cleanup, add fused fp16 read / write · cac061a1
  Simon Layton authored Mar 08, 2019
```
Fuse in fp16 gradient -> fp32 convert
Additional option fp16 weight copy written out
```
  cac061a1
- Fused multi-tensor SGD · cadad920
  Simon Layton authored Mar 08, 2019
```
Initial implementation, all fp32
Tested against torch.optim.sgd
```
  cadad920