Commits · f560bd0b83bf82d0a2e4806a3d52558570facaee · OpenDAS / apex

02 May, 2020 1 commit
- save a sync when calculating global gradient norm · f560bd0b
  Kexin Yu authored May 02, 2020
  
  f560bd0b
01 May, 2020 4 commits
- Merge branch 'master' of https://github.com/NVIDIA/apex · ac4ef2d6
  Kexin Yu authored May 01, 2020
  
  ac4ef2d6
- make use_nvlamb a class attribute for FusedLAMB · 85e4af76
  Kexin Yu authored Apr 30, 2020
  
  85e4af76
- Changes to make xentropysoftmax load/store vectorized when possible: (#725) · cf50dc7c
  Deyu Fu authored Apr 30, 2020
```
* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided, so same order compare to vector case

* Add shift for not aligned case. Remove less than 16 bytes aligned access
```
  cf50dc7c
- add import · 3fd3e2c8
  Kexin Yu authored Apr 30, 2020
  
  3fd3e2c8
30 Apr, 2020 4 commits

fix function signature for LAMBStage2Functor · c8bcfff8
Kexin Yu authored Apr 30, 2020

c8bcfff8
enable wider load/store for multi_tensor_apply kernels (#763) · 17ee854e
Deyu Fu authored Apr 30, 2020
```
* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
```
17ee854e

Improvements to apex.mlp (#804) · 31aceeaa

Deyu Fu authored Apr 30, 2020

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

31aceeaa

fix dropout scaling from p to 1/(1-p) (#816) · aad9300b
Burc Eryilmaz authored Apr 30, 2020
```
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
```
aad9300b

28 Apr, 2020 1 commit
- LAMB: global grad clipping & more flexibility in adaptive lr · 5b300119
  Kexin Yu authored Apr 28, 2020
  
  5b300119
23 Apr, 2020 1 commit

CUDAGenerator fix for #36026 (#801) · 1f2aa915

ptrblck authored Apr 22, 2020



* add CUDAGenerator guard

* fix generator_flag

* add guards for gen pointer/ref issue

* change mutex_ to mutex()

* add check_generator
Co-authored-by: pbialecki <pbialecki@nvidia.com>

1f2aa915

22 Apr, 2020 2 commits

initial commit to add Multilayer Perceptron (MLP) extension (#790) · 71511faf
Deyu Fu authored Apr 22, 2020

71511faf

Fix LARC with mixed precision (#793) · 2ec84ebd

Vinicius Reis authored Apr 22, 2020

The LARC optimizer wraps an underlying optimizer and then needs to be passed
to amp.initialize for mixed precision. There were 3 different crashes happening
in this situation, fix all of them and add a unit test.

I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
defined seems more reliable though.

2ec84ebd

20 Apr, 2020 2 commits
- Merge pull request #761 from kexinyu/master · 55716d85
  Kexin Yu authored Apr 20, 2020
```
add additional loop for lists of params in FP16_Optimizer's load_state_dict 
```
  55716d85
- install option for contrib.optimizers.FusedLAMB · 04de0f7a
  Kexin Yu authored Apr 20, 2020
  
  04de0f7a
13 Apr, 2020 1 commit
- Return internal optimizer's param_groups from LARC (#767) · 11faaca7
  Mannat Singh authored Apr 13, 2020
  
  11faaca7
05 Apr, 2020 2 commits
- fix typo · f3a960f8
  Kexin Yu authored Apr 05, 2020
  
  f3a960f8
- .item() · d38e6fe4
  Kexin Yu authored Apr 05, 2020
  
  d38e6fe4
03 Apr, 2020 4 commits
- more debugging · a0bf956a
  Kexin Yu authored Apr 03, 2020
  
  a0bf956a
- check empty lists · feb93a2a
  Kexin Yu authored Apr 02, 2020
  
  feb93a2a
- more debugging · 8e5699e4
  Kexin Yu authored Apr 02, 2020
  
  8e5699e4
- seg fault debugging · 9b96c824
  Kexin Yu authored Apr 02, 2020
  
  9b96c824
02 Apr, 2020 1 commit
- import amp_C.multi_tensor_l2norm · 92186863
  Kexin Yu authored Apr 01, 2020
  
  92186863
01 Apr, 2020 2 commits
- add printing to test · 96b017a8
  Kexin Yu authored Mar 31, 2020
  
  96b017a8
- fix parameter type · 90729bc8
  Kexin Yu authored Mar 31, 2020
  
  90729bc8
31 Mar, 2020 2 commits
- clip gradients globally, rather than per group · 32d2c4e2
  Kexin Yu authored Mar 31, 2020
  
  32d2c4e2
- Add support for bool datatype (#601) (#603) · ca00adac
  Jeff Bowles authored Mar 31, 2020
  
  ca00adac
25 Mar, 2020 1 commit

Fix contrib fused_adam to work correctly with multi-GPU (#752) · 8fac3a72

msbaines authored Mar 24, 2020



The cuda kernel used by fused-adam was using the default stream
on the default device. The kernel needs use the same device as
the parameter tensor.

Fixed by using context manager to set correct default device. For
the use_mt case, raised an error. Alternatively, the use_mt
case could launch one kernel per cuda device.

The non-contrib version will also need to be fixed.
Co-authored-by: Mandeep Singh Baines <msb@fb.com>

8fac3a72

23 Mar, 2020 2 commits
- revert to gradient pre-normalization · 8405d436
  Kexin Yu authored Mar 23, 2020
  
  8405d436
- add l2norm source for FusedLAMB · a3ffb8a7
  Kexin Yu authored Mar 23, 2020
  
  a3ffb8a7
21 Mar, 2020 2 commits
- fix typo · 04927b3a
  Kexin Yu authored Mar 21, 2020
  
  04927b3a
- import name fix · d8a78acb
  Kexin Yu authored Mar 21, 2020
  
  d8a78acb
20 Mar, 2020 3 commits
- add FusedLamb in __init__ · 33f21d68
  Kexin Yu authored Mar 20, 2020
  
  33f21d68
- extension name fix · b4c32010
  Kexin Yu authored Mar 20, 2020
  
  b4c32010
- apex.contrib.optimizers.FuseLamb first commit · b222ed2b
  Kexin Yu authored Mar 19, 2020
  
  b222ed2b
17 Mar, 2020 2 commits
- add additional loop for lists of params when loading state_dict in... · 35e86d3d
  Kexin Yu authored Mar 17, 2020
```
add additional loop for lists of params when loading state_dict in apex.contrib.optimizers.FP16_Optimizer
```
  35e86d3d
- Merge remote-tracking branch 'upstream/master' · 93f91cde
  Kexin Yu authored Mar 17, 2020
  
  93f91cde
11 Mar, 2020 2 commits

Fix deprecated calls in multihead_attn and ninja build failure (#746) · 80b90b9d

ptrblck authored Mar 11, 2020



* disable ninja for multihead_attn

* fix getCurrentStream in multihead_attn
Co-authored-by: pbialecki <pbialecki@nvidia.com>

80b90b9d

Do not unscale the gradients if loss scale equal to 1 (#748) · 20d00ab1

Tomasz Grel authored Mar 11, 2020

* Do not unscale the gradients if loss scale equal to 1

* Disable unscaling loss scale == 1 only for static scaling

20d00ab1

02 Mar, 2020 1 commit
- Revert "remove gencode from multihead_attn build (#731)" · 5633f6db
  pbialecki authored Mar 01, 2020
```
This reverts commit 92b3b9a9.
```
  5633f6db