Commits · 93f3a3bcb157b72fb9dd731101eeb56815d4e421 · OpenDAS / apex

19 Oct, 2021 1 commit
- Revert back to the test_fused_optimizer.py in upstream to solve multiple unit test errors · 93f3a3bc
  Hubert Lu authored Oct 19, 2021
  
  93f3a3bc
08 Oct, 2021 1 commit
- Remove `custom_fwd`/`custom_bwd` from fused softmax (#1188) · 14ccf598
  Masaki Kozuki authored Oct 09, 2021
```
* run backward

* remove custom_fwd/custom_bwd
```
  14ccf598
06 Oct, 2021 1 commit
- ColumnParallelLinearWithAsyncAllreduce autocast support (#1183) · b3da6036
  Masaki Kozuki authored Oct 06, 2021
```
* [ColumnParallelLinear] Test behavior in autocast

* fix test

* casts manually to autocast dtype
```
  b3da6036
02 Oct, 2021 1 commit

Masaki Kozuki authored Oct 02, 2021


Co-authored-by: Piotr Bialecki <pbialecki@nvidia.com>
Co-authored-by: Eddie Yan <eddiey@nvidia.com>
Co-authored-by: Rishi Puri <riship@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>

365fdc18

15 Apr, 2021 1 commit

Add unit tests for Fused NovoGrad (#1065) · 59d2f7ac

Sudhakar Singh authored Apr 15, 2021

* Add unit tests for fused-novograd

* Fix: tensors should reside on the same device

* Fix: Cudastream should be called on the same device on which the tensors reside on. Found this during debugging fused novograd multi-device unit test

* fixed issues mentioned in the comments

59d2f7ac

21 Jan, 2021 1 commit
- use __launch_bounds__ for multi_tensor_apply (#44) · 5baa68d3
  Jeff Daily authored Jan 21, 2021
```
use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
```
  5baa68d3
18 Jan, 2021 1 commit
- skip failing tests on ROCm · 13c8d152
  Jeff Daily authored Jan 18, 2021
  
  13c8d152
15 Jan, 2021 1 commit
- Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm · ff232fb8
  Sarunya Pumma authored Nov 28, 2020
  
  ff232fb8
31 Dec, 2020 2 commits
- missing import statement · 41bbf93c
  lcskrishna authored Dec 31, 2020
  
  41bbf93c
- skip the unit tests · 5bae299e
  lcskrishna authored Dec 31, 2020
  
  5bae299e
01 Dec, 2020 1 commit

DistributedFusedAdam Model Parallelism Support (Megatron) (#981) · 6b7e77b0

Kexin Yu authored Dec 01, 2020



DistributedFusedAdam Model Parallelism Support (Megatron)
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>

6b7e77b0

04 Nov, 2020 1 commit

Fix LayerNorm op on ROCm (#36) · 7eed38aa

Ashish Farmer authored Nov 04, 2020

* fix warp size in WARP_SHFL* in layernorm

* enable fused_layer_norm tests on ROCm

7eed38aa

05 Aug, 2020 2 commits

Enable mlp_cuda extension. (#28) · d2f6d04a

Chaitanya Sri Krishna Lolla authored Aug 05, 2020

* enable mlp cuda

* add setup changes and tests

* skip the unit tests

* updated conditions for empty array

* removed hip platform conditions

d2f6d04a

set device guard for multi tensor optimizer implementations (#927) · 274cc063

ngimel authored Aug 05, 2020

* add device guards to the optimizers

* add untracked file

* set deviceGuard in multi_tensor_apply

* address review comments; fix lamb

* indent

* typo

274cc063

07 Jul, 2020 1 commit
- skip newer tests · eba809d7
  lcskrishna authored Jul 07, 2020
  
  eba809d7
23 Jun, 2020 3 commits
- add test case for non-zero weight decay · ad50ce9a
  Kexin Yu authored Jun 23, 2020
  
  ad50ce9a
- test nvlamb; hyperparams consistent with adam/adagrad tests · cd3d6d12
  Kexin Yu authored Jun 23, 2020
  
  cd3d6d12
- add test for FusedLAMB · 9774ce0d
  Kexin Yu authored Jun 22, 2020
  
  9774ce0d
26 May, 2020 1 commit
- enable bfloat16 for optimizers · 85549903
  rohithkrn authored May 26, 2020
  
  85549903
21 May, 2020 2 commits
- enable skipped unit tests fused_sgd, multiple_models_and_optimizers · 9297be60
  lcskrishna authored May 21, 2020
  
  9297be60
- add ROCm L0 test script · 486fc0ed
  sunway513 authored May 20, 2020
  
  486fc0ed
20 May, 2020 2 commits
- missing import packages · 27310f34
  lcskrishna authored May 20, 2020
  
  27310f34
- skip tests that are failing after bfp16 · 2e2584fc
  lcskrishna authored May 20, 2020
  
  2e2584fc
19 May, 2020 4 commits
- enable run_optimizer tests · 49db74c8
  lcskrishna authored May 19, 2020
  
  49db74c8
- enable run_amp tests · 464e95f5
  lcskrishna authored May 19, 2020
  
  464e95f5
- enable fp16_utils test suite · d0555980
  lcskrishna authored May 19, 2020
  
  d0555980
- create a base framework for adding tests · a73d7d3b
  lcskrishna authored May 19, 2020
  
  a73d7d3b
15 May, 2020 2 commits
- remove whitespaces · e1267a9a
  rohithkrn authored May 15, 2020
  
  e1267a9a
- add tests for O4 and O5 opt levels · 32157739
  rohithkrn authored May 15, 2020
  
  32157739
14 May, 2020 1 commit
- Add FusedAdagrad (#822) · 3bae8c83
  Andrew Tulloch authored May 14, 2020
  
  3bae8c83
13 May, 2020 1 commit
- add bflaot16 tests in test_basic_casts · d283f97f
  rohithkrn authored May 12, 2020
  
  d283f97f
07 May, 2020 1 commit

[Upstream] IFU 05072020 (#4) · e85a1d4b

Chaitanya Sri Krishna Lolla authored May 07, 2020



* fix dropout scaling from p to 1/(1-p) (#816)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>

* Improvements to apex.mlp (#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided, so same order compare to vector case

* Add shift for not aligned case. Remove less than 16 bytes aligned access
Co-authored-by: Burc Eryilmaz <sberyilm@gmail.com>
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
Co-authored-by: Deyu Fu <deyuf@nvidia.com>

e85a1d4b

30 Apr, 2020 1 commit

Improvements to apex.mlp (#804) · 31aceeaa

Deyu Fu authored Apr 30, 2020

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

31aceeaa

22 Apr, 2020 2 commits

initial commit to add Multilayer Perceptron (MLP) extension (#790) · 71511faf
Deyu Fu authored Apr 22, 2020

71511faf

Fix LARC with mixed precision (#793) · 2ec84ebd

Vinicius Reis authored Apr 22, 2020

The LARC optimizer wraps an underlying optimizer and then needs to be passed
to amp.initialize for mixed precision. There were 3 different crashes happening
in this situation, fix all of them and add a unit test.

I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
defined seems more reliable though.

2ec84ebd

31 Mar, 2020 1 commit
- Add support for bool datatype (#601) (#603) · ca00adac
  Jeff Bowles authored Mar 31, 2020
  
  ca00adac
27 Feb, 2020 1 commit
- NHWC support for multi tensor apply (#732) · de6378f5
  mcarilli authored Feb 26, 2020
```
* NHWC support for multi tensor apply

* compilation fix for version<=1.4
```
  de6378f5
03 Oct, 2019 1 commit

Disable tests for mixed opt_levels, add bitwise accurate test of parameters (#520) · 0b74bfd9

ptrblck authored Oct 03, 2019

* increase atol for Half-Float comparison to 1.5e-4

* disable tests for different opt_levels

* reset atol

* add bitwise accurate comparison

0b74bfd9

03 Sep, 2019 1 commit

Fix issues in fused_dam (#469) · 7fa74925

Deyu Fu authored Sep 03, 2019

* move import of amp_C to __init__()

* make fp16/32 separate lists to support mixed param types, disable double test

* make zero_grad consistent between adam/novograd/lamb

7fa74925

27 Aug, 2019 1 commit

Enable Checkpointing (#420) · dec4fdd6

ptrblck authored Aug 27, 2019

* add state_dict, load_state_dict

* add test_restoring, test_loss_scale_decrease

* disable amp outputs for checkpoint tests

* add test for amp.state_dict, cleanup

* add state_dict patch, add test

* fixed testing, cleanup

* add readme for checkpointing

* add docs to source/amp

* add review changes to doc

dec4fdd6