Commits · aee9f00d90d727b463e2830d3b3adbf6bafd8fc5 · OpenDAS / apex

27 Oct, 2021 1 commit
- Revert "Enable MLP unit tests on ROCm" · aee9f00d
  hubertlu authored Oct 27, 2021
```
This reverts commit 964e61f1.
```
  aee9f00d
26 Oct, 2021 1 commit
- Enable MLP unit tests on ROCm · 964e61f1
  hubertlu authored Oct 26, 2021
  
  964e61f1
20 Oct, 2021 1 commit
- Revert test_fused_layer_norm.py to prevent from missing torch.cuda.is_bf16_supported in pytorch 1.9 · d36b3c63
  Hubert Lu authored Oct 20, 2021
  
  d36b3c63
19 Oct, 2021 1 commit
- Revert back to the test_fused_optimizer.py in upstream to solve multiple unit test errors · 93f3a3bc
  Hubert Lu authored Oct 19, 2021
  
  93f3a3bc
08 Oct, 2021 1 commit
- Remove `custom_fwd`/`custom_bwd` from fused softmax (#1188) · 14ccf598
  Masaki Kozuki authored Oct 09, 2021
```
* run backward

* remove custom_fwd/custom_bwd
```
  14ccf598
06 Oct, 2021 1 commit
- ColumnParallelLinearWithAsyncAllreduce autocast support (#1183) · b3da6036
  Masaki Kozuki authored Oct 06, 2021
```
* [ColumnParallelLinear] Test behavior in autocast

* fix test

* casts manually to autocast dtype
```
  b3da6036
02 Oct, 2021 1 commit

Masaki Kozuki authored Oct 02, 2021


Co-authored-by: Piotr Bialecki <pbialecki@nvidia.com>
Co-authored-by: Eddie Yan <eddiey@nvidia.com>
Co-authored-by: Rishi Puri <riship@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>

365fdc18

15 Apr, 2021 1 commit

Add unit tests for Fused NovoGrad (#1065) · 59d2f7ac

Sudhakar Singh authored Apr 15, 2021

* Add unit tests for fused-novograd

* Fix: tensors should reside on the same device

* Fix: Cudastream should be called on the same device on which the tensors reside on. Found this during debugging fused novograd multi-device unit test

* fixed issues mentioned in the comments

59d2f7ac

25 Jan, 2021 1 commit

fix bugs in syncbn (#46) · 3f49dbf0

Jeff Daily authored Jan 25, 2021

- incorrect use of __shfl_down
- fix warp size assumptions
- update unit tests to exit on failure

3f49dbf0

21 Jan, 2021 1 commit
- use __launch_bounds__ for multi_tensor_apply (#44) · 5baa68d3
  Jeff Daily authored Jan 21, 2021
```
use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
```
  5baa68d3
18 Jan, 2021 1 commit
- skip failing tests on ROCm · 13c8d152
  Jeff Daily authored Jan 18, 2021
  
  13c8d152
15 Jan, 2021 1 commit
- Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm · ff232fb8
  Sarunya Pumma authored Nov 28, 2020
  
  ff232fb8
31 Dec, 2020 2 commits
- missing import statement · 41bbf93c
  lcskrishna authored Dec 31, 2020
  
  41bbf93c
- skip the unit tests · 5bae299e
  lcskrishna authored Dec 31, 2020
  
  5bae299e
01 Dec, 2020 1 commit

DistributedFusedAdam Model Parallelism Support (Megatron) (#981) · 6b7e77b0

Kexin Yu authored Dec 01, 2020



DistributedFusedAdam Model Parallelism Support (Megatron)
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>

6b7e77b0

04 Nov, 2020 1 commit

Fix LayerNorm op on ROCm (#36) · 7eed38aa

Ashish Farmer authored Nov 04, 2020

* fix warp size in WARP_SHFL* in layernorm

* enable fused_layer_norm tests on ROCm

7eed38aa

05 Aug, 2020 2 commits

Enable mlp_cuda extension. (#28) · d2f6d04a

Chaitanya Sri Krishna Lolla authored Aug 05, 2020

* enable mlp cuda

* add setup changes and tests

* skip the unit tests

* updated conditions for empty array

* removed hip platform conditions

d2f6d04a

set device guard for multi tensor optimizer implementations (#927) · 274cc063

ngimel authored Aug 05, 2020

* add device guards to the optimizers

* add untracked file

* set deviceGuard in multi_tensor_apply

* address review comments; fix lamb

* indent

* typo

274cc063

31 Jul, 2020 1 commit
- skipping bfloat16 mgpu tests (#32) · 8dd19e3b
  Chaitanya Sri Krishna Lolla authored Jul 31, 2020
  
  8dd19e3b
10 Jul, 2020 1 commit

Enable sync batchnorm extension. (#27) · 9c80f6d3

Chaitanya Sri Krishna Lolla authored Jul 10, 2020

* Enable sync batchnorm

* enable syncbn properly

* update the unit tests

* update tests

* update conditions for welford_merge_element

* updated conditions based on comments.

9c80f6d3

07 Jul, 2020 1 commit
- skip newer tests · eba809d7
  lcskrishna authored Jul 07, 2020
  
  eba809d7
06 Jul, 2020 1 commit

[sync BN] (#792) · 1ff54b8f

jjsjann123 authored Jul 06, 2020

* [sync BN]

support non-uniform batch size across process group.

TODO: test should be added once cleaned up.

* updating unit tests

* new unit tests for different inputs

* cleaning

1ff54b8f

23 Jun, 2020 3 commits
- add test case for non-zero weight decay · ad50ce9a
  Kexin Yu authored Jun 23, 2020
  
  ad50ce9a
- test nvlamb; hyperparams consistent with adam/adagrad tests · cd3d6d12
  Kexin Yu authored Jun 23, 2020
  
  cd3d6d12
- add test for FusedLAMB · 9774ce0d
  Kexin Yu authored Jun 22, 2020
  
  9774ce0d
03 Jun, 2020 1 commit

bfloat16 support for mgpu (#19) · b0c7d09f

rohithkrn authored Jun 03, 2020

* bfloat16 support for apex DDP

* enable mgpu tests for fp16 and bf16

* update Dockerfile

b0c7d09f

26 May, 2020 1 commit
- enable bfloat16 for optimizers · 85549903
  rohithkrn authored May 26, 2020
  
  85549903
21 May, 2020 2 commits
- enable skipped unit tests fused_sgd, multiple_models_and_optimizers · 9297be60
  lcskrishna authored May 21, 2020
  
  9297be60
- add ROCm L0 test script · 486fc0ed
  sunway513 authored May 20, 2020
  
  486fc0ed
20 May, 2020 2 commits
- missing import packages · 27310f34
  lcskrishna authored May 20, 2020
  
  27310f34
- skip tests that are failing after bfp16 · 2e2584fc
  lcskrishna authored May 20, 2020
  
  2e2584fc
19 May, 2020 4 commits
- enable run_optimizer tests · 49db74c8
  lcskrishna authored May 19, 2020
  
  49db74c8
- enable run_amp tests · 464e95f5
  lcskrishna authored May 19, 2020
  
  464e95f5
- enable fp16_utils test suite · d0555980
  lcskrishna authored May 19, 2020
  
  d0555980
- create a base framework for adding tests · a73d7d3b
  lcskrishna authored May 19, 2020
  
  a73d7d3b
15 May, 2020 2 commits
- remove whitespaces · e1267a9a
  rohithkrn authored May 15, 2020
  
  e1267a9a
- add tests for O4 and O5 opt levels · 32157739
  rohithkrn authored May 15, 2020
  
  32157739
14 May, 2020 1 commit
- Add FusedAdagrad (#822) · 3bae8c83
  Andrew Tulloch authored May 14, 2020
  
  3bae8c83
13 May, 2020 1 commit
- add bflaot16 tests in test_basic_casts · d283f97f
  rohithkrn authored May 12, 2020
  
  d283f97f
07 May, 2020 1 commit

[Upstream] IFU 05072020 (#4) · e85a1d4b

Chaitanya Sri Krishna Lolla authored May 07, 2020



* fix dropout scaling from p to 1/(1-p) (#816)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>

* Improvements to apex.mlp (#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided, so same order compare to vector case

* Add shift for not aligned case. Remove less than 16 bytes aligned access
Co-authored-by: Burc Eryilmaz <sberyilm@gmail.com>
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
Co-authored-by: Deyu Fu <deyuf@nvidia.com>

e85a1d4b