Commits · 2e2584fc66cceef6acc038c309e0e98f394428ec · OpenDAS / apex

20 May, 2020 1 commit
- skip tests that are failing after bfp16 · 2e2584fc
  lcskrishna authored May 20, 2020
  
  2e2584fc
19 May, 2020 7 commits
- Merge branch 'master' of https://github.com/ROCmSoftwarePlatform/apex into cl/enable-test-framework · 4ac8ecb9
  lcskrishna authored May 19, 2020
  
  4ac8ecb9
- Merge pull request #5 from rohithkrn/apex_amp_bfp16 · b2da92fc
  Peng authored May 19, 2020
```
Introduce new optimization levels for BFloat16 training
```
  b2da92fc
- remove unnecessary comments · bc626b13
  lcskrishna authored May 19, 2020
  
  bc626b13
- enable run_optimizer tests · 49db74c8
  lcskrishna authored May 19, 2020
  
  49db74c8
- enable run_amp tests · 464e95f5
  lcskrishna authored May 19, 2020
  
  464e95f5
- enable fp16_utils test suite · d0555980
  lcskrishna authored May 19, 2020
  
  d0555980
- create a base framework for adding tests · a73d7d3b
  lcskrishna authored May 19, 2020
  
  a73d7d3b
18 May, 2020 1 commit
- enable multi tensor apply fusedadagrad (#9) · 65490af6
  Chaitanya Sri Krishna Lolla authored May 18, 2020
  
  65490af6
15 May, 2020 4 commits
- Merge pull request #8 from lcskrishna/ifu_05152020 · f7fa414f
  Ashish Farmer authored May 15, 2020
```
[Upstream] IFU 05/15/2020
```
  f7fa414f
- Merge branch 'master' into ifu_05152020 · fc73954a
  Chaitanya Sri Krishna Lolla authored May 15, 2020
  
  fc73954a
- remove whitespaces · e1267a9a
  rohithkrn authored May 15, 2020
  
  e1267a9a
- add tests for O4 and O5 opt levels · 32157739
  rohithkrn authored May 15, 2020
  
  32157739
14 May, 2020 1 commit
- Add FusedAdagrad (#822) · 3bae8c83
  Andrew Tulloch authored May 14, 2020
  
  3bae8c83
13 May, 2020 3 commits
- Fixes flake8 --select W605 test warnings (#829) · 9165b27f
  Andrew Sears authored May 13, 2020
```
Signed-off-by: asears <asears@users.noreply.github.com>
```
  9165b27f
- Merge remote-tracking branch 'rocm_up/master' into apex_amp_bfp15 · ba2407e2
  rohithkrn authored May 12, 2020
  
  ba2407e2
- add bflaot16 tests in test_basic_casts · d283f97f
  rohithkrn authored May 12, 2020
  
  d283f97f
12 May, 2020 5 commits
- Enable support for sparse tensors for multi_tensor_apply (#6) · 02a5274b
  Chaitanya Sri Krishna Lolla authored May 12, 2020
  
  02a5274b
- Merge pull request #753 from NVIDIA/revertable_fused_adam_with_mt_support · e1b7997a
  Thor Johnsen authored May 12, 2020
```
Reversible fused adam with mt support
```
  e1b7997a
- enable multi tensor extension for bfloat16 · 69251362
  rohithkrn authored May 11, 2020
  
  69251362
- revert to original · cec08a41
  rohithkrn authored May 11, 2020
  
  cec08a41
- Resolve possible race condition in stride_finite_check kernel · 758826fc
  Thor Johnsen authored May 11, 2020
  
  758826fc
11 May, 2020 1 commit
- disble multi tensor apply for O4, O5 · 3ff2178c
  rohithkrn authored May 10, 2020
  
  3ff2178c
09 May, 2020 1 commit
- add bfloat16 register functions, enable rnn functions, enable promote functions · de3f3fea
  rohithkrn authored May 08, 2020
  
  de3f3fea
08 May, 2020 3 commits
- Merge · 0bfb8300
  Thor Johnsen authored May 08, 2020
  
  0bfb8300
- Merge remote-tracking branch 'rocm_up/master' into apex_amp_bfp16 · 6e14df49
  rohithkrn authored May 08, 2020
  
  6e14df49
- basic enablement for O4 and O5 opt levels · c7fd532c
  rohithkrn authored May 08, 2020
  
  c7fd532c
07 May, 2020 5 commits

Enable fusedlayernorm extension (#3) · 2d0f9cf2
Chaitanya Sri Krishna Lolla authored May 07, 2020

2d0f9cf2
Resolve merge conflict · 2619f1cb
Thor Johnsen authored May 07, 2020

2619f1cb
enable python only base sparse tensor support for loss scaling (#2) · 3ccdd63d
Chaitanya Sri Krishna Lolla authored May 07, 2020

3ccdd63d

[Upstream] IFU 05072020 (#4) · e85a1d4b

Chaitanya Sri Krishna Lolla authored May 07, 2020



* fix dropout scaling from p to 1/(1-p) (#816)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>

* Improvements to apex.mlp (#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided, so same order compare to vector case

* Add shift for not aligned case. Remove less than 16 bytes aligned access
Co-authored-by: Burc Eryilmaz <sberyilm@gmail.com>
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
Co-authored-by: Deyu Fu <deyuf@nvidia.com>

e85a1d4b

Slight improvements · 91a5a87e
Thor Johnsen authored May 06, 2020

91a5a87e

06 May, 2020 3 commits
- Re-introduce original non-reversible fused contrib adam cuda kernel · 25c80afe
  Thor Johnsen authored May 06, 2020
  
  25c80afe
- Revert regular contrib fused adam optimizer · 9bb71066
  Thor Johnsen authored May 06, 2020
  
  9bb71066
- Ultra-simple global all-reduce version of distributed optimizer · 7e3536dd
  Thor Johnsen authored May 05, 2020
  
  7e3536dd
05 May, 2020 1 commit
- Try out different partition scheme · a60bbe63
  Thor Johnsen authored May 04, 2020
  
  a60bbe63
04 May, 2020 1 commit
- Bug fix · 7da28fc3
  Thor Johnsen authored May 04, 2020
  
  7da28fc3
01 May, 2020 1 commit

Changes to make xentropysoftmax load/store vectorized when possible: (#725) · cf50dc7c

Deyu Fu authored Apr 30, 2020

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided, so same order compare to vector case

* Add shift for not aligned case. Remove less than 16 bytes aligned access

cf50dc7c

30 Apr, 2020 2 commits

enable wider load/store for multi_tensor_apply kernels (#763) · 17ee854e
Deyu Fu authored Apr 30, 2020
```
* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
```
17ee854e

Improvements to apex.mlp (#804) · 31aceeaa

Deyu Fu authored Apr 30, 2020

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

31aceeaa