Commits · 32157739927714bfa7c7666c69259e6c71c86d94 · OpenDAS / apex

15 May, 2020 1 commit
- add tests for O4 and O5 opt levels · 32157739
  rohithkrn authored May 15, 2020
  
  32157739
13 May, 2020 2 commits
- Merge remote-tracking branch 'rocm_up/master' into apex_amp_bfp15 · ba2407e2
  rohithkrn authored May 12, 2020
  
  ba2407e2
- add bflaot16 tests in test_basic_casts · d283f97f
  rohithkrn authored May 12, 2020
  
  d283f97f
12 May, 2020 3 commits
- Enable support for sparse tensors for multi_tensor_apply (#6) · 02a5274b
  Chaitanya Sri Krishna Lolla authored May 12, 2020
  
  02a5274b
- enable multi tensor extension for bfloat16 · 69251362
  rohithkrn authored May 11, 2020
  
  69251362
- revert to original · cec08a41
  rohithkrn authored May 11, 2020
  
  cec08a41
11 May, 2020 1 commit
- disble multi tensor apply for O4, O5 · 3ff2178c
  rohithkrn authored May 10, 2020
  
  3ff2178c
09 May, 2020 1 commit
- add bfloat16 register functions, enable rnn functions, enable promote functions · de3f3fea
  rohithkrn authored May 08, 2020
  
  de3f3fea
08 May, 2020 2 commits
- Merge remote-tracking branch 'rocm_up/master' into apex_amp_bfp16 · 6e14df49
  rohithkrn authored May 08, 2020
  
  6e14df49
- basic enablement for O4 and O5 opt levels · c7fd532c
  rohithkrn authored May 08, 2020
  
  c7fd532c
07 May, 2020 3 commits

Enable fusedlayernorm extension (#3) · 2d0f9cf2
Chaitanya Sri Krishna Lolla authored May 07, 2020

2d0f9cf2
enable python only base sparse tensor support for loss scaling (#2) · 3ccdd63d
Chaitanya Sri Krishna Lolla authored May 07, 2020

3ccdd63d

[Upstream] IFU 05072020 (#4) · e85a1d4b

Chaitanya Sri Krishna Lolla authored May 07, 2020



* fix dropout scaling from p to 1/(1-p) (#816)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>

* Improvements to apex.mlp (#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided...

e85a1d4b

28 Apr, 2020 1 commit

Enable Apex on ROCm and support multi tensor support. (#1) · 8124df13

Chaitanya Sri Krishna Lolla authored Apr 28, 2020

* Initial commit to hipify all cuda code

* enable multi_tensor_apply extension

* added generatedFileCleaner to handle nested hip files

8124df13

23 Apr, 2020 1 commit

CUDAGenerator fix for #36026 (#801) · 1f2aa915

ptrblck authored Apr 22, 2020



* add CUDAGenerator guard

* fix generator_flag

* add guards for gen pointer/ref issue

* change mutex_ to mutex()

* add check_generator
Co-authored-by: pbialecki <pbialecki@nvidia.com>

1f2aa915

22 Apr, 2020 2 commits

initial commit to add Multilayer Perceptron (MLP) extension (#790) · 71511faf
Deyu Fu authored Apr 22, 2020

71511faf

Fix LARC with mixed precision (#793) · 2ec84ebd

Vinicius Reis authored Apr 22, 2020

The LARC optimizer wraps an underlying optimizer and then needs to be passed
to amp.initialize for mixed precision. There were 3 different crashes happening
in this situation, fix all of them and add a unit test.

I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
defined seems more reliable though.

2ec84ebd

20 Apr, 2020 2 commits
- Merge pull request #761 from kexinyu/master · 55716d85
  Kexin Yu authored Apr 20, 2020
```
add additional loop for lists of params in FP16_Optimizer's load_state_dict 
```
  55716d85
- install option for contrib.optimizers.FusedLAMB · 04de0f7a
  Kexin Yu authored Apr 20, 2020
  
  04de0f7a
13 Apr, 2020 1 commit
- Return internal optimizer's param_groups from LARC (#767) · 11faaca7
  Mannat Singh authored Apr 13, 2020
  
  11faaca7
05 Apr, 2020 2 commits
- fix typo · f3a960f8
  Kexin Yu authored Apr 05, 2020
  
  f3a960f8
- .item() · d38e6fe4
  Kexin Yu authored Apr 05, 2020
  
  d38e6fe4
03 Apr, 2020 4 commits
- more debugging · a0bf956a
  Kexin Yu authored Apr 03, 2020
  
  a0bf956a
- check empty lists · feb93a2a
  Kexin Yu authored Apr 02, 2020
  
  feb93a2a
- more debugging · 8e5699e4
  Kexin Yu authored Apr 02, 2020
  
  8e5699e4
- seg fault debugging · 9b96c824
  Kexin Yu authored Apr 02, 2020
  
  9b96c824
02 Apr, 2020 1 commit
- import amp_C.multi_tensor_l2norm · 92186863
  Kexin Yu authored Apr 01, 2020
  
  92186863
01 Apr, 2020 2 commits
- add printing to test · 96b017a8
  Kexin Yu authored Mar 31, 2020
  
  96b017a8
- fix parameter type · 90729bc8
  Kexin Yu authored Mar 31, 2020
  
  90729bc8
31 Mar, 2020 2 commits
- clip gradients globally, rather than per group · 32d2c4e2
  Kexin Yu authored Mar 31, 2020
  
  32d2c4e2
- Add support for bool datatype (#601) (#603) · ca00adac
  Jeff Bowles authored Mar 31, 2020
  
  ca00adac
25 Mar, 2020 1 commit

Fix contrib fused_adam to work correctly with multi-GPU (#752) · 8fac3a72

msbaines authored Mar 24, 2020



The cuda kernel used by fused-adam was using the default stream
on the default device. The kernel needs use the same device as
the parameter tensor.

Fixed by using context manager to set correct default device. For
the use_mt case, raised an error. Alternatively, the use_mt
case could launch one kernel per cuda device.

The non-contrib version will also need to be fixed.
Co-authored-by: Mandeep Singh Baines <msb@fb.com>

8fac3a72

23 Mar, 2020 2 commits
- revert to gradient pre-normalization · 8405d436
  Kexin Yu authored Mar 23, 2020
  
  8405d436
- add l2norm source for FusedLAMB · a3ffb8a7
  Kexin Yu authored Mar 23, 2020
  
  a3ffb8a7
21 Mar, 2020 2 commits
- fix typo · 04927b3a
  Kexin Yu authored Mar 21, 2020
  
  04927b3a
- import name fix · d8a78acb
  Kexin Yu authored Mar 21, 2020
  
  d8a78acb
20 Mar, 2020 3 commits
- add FusedLamb in __init__ · 33f21d68
  Kexin Yu authored Mar 20, 2020
  
  33f21d68
- extension name fix · b4c32010
  Kexin Yu authored Mar 20, 2020
  
  b4c32010
- apex.contrib.optimizers.FuseLamb first commit · b222ed2b
  Kexin Yu authored Mar 19, 2020
  
  b222ed2b
17 Mar, 2020 1 commit
- add additional loop for lists of params when loading state_dict in... · 35e86d3d
  Kexin Yu authored Mar 17, 2020
```
add additional loop for lists of params when loading state_dict in apex.contrib.optimizers.FP16_Optimizer
```
  35e86d3d