Commits · dde39c9f4c8cb5fa1fd4e31e72e7ca2302937793 · OpenDAS / apex

04 Mar, 2021 1 commit
- Merge pull request #47 from ROCmSoftwarePlatform/revert_workaround · dde39c9f
  Peng authored Mar 04, 2021
```
Revert "pass all TensorListMetadata as pointer to pinned host memory (#13)
```
  dde39c9f
25 Feb, 2021 1 commit
- Revert "pass all TensorListMetadata as pointer to pinned host memory (#13)" · fbb8cd93
  Jeff Daily authored Feb 25, 2021
```
This reverts commit bdd481d1.
```
  fbb8cd93
25 Jan, 2021 1 commit

Jeff Daily authored Jan 25, 2021

- incorrect use of __shfl_down
- fix warp size assumptions
- update unit tests to exit on failure

3f49dbf0

21 Jan, 2021 2 commits
- fix cross-compiled ROCm builds when no GPUs detected (#45) · c1e88fae
  Jeff Daily authored Jan 21, 2021
  
  c1e88fae
- use __launch_bounds__ for multi_tensor_apply (#44) · 5baa68d3
  Jeff Daily authored Jan 21, 2021
```
use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
```
  5baa68d3
19 Jan, 2021 1 commit
- Merge pull request #43 from ROCmSoftwarePlatform/IFU-2021-01-18 · 85b56d01
  Jeff Daily authored Jan 19, 2021
```
IFU-2021-01-18
```
  85b56d01
18 Jan, 2021 5 commits
- skip failing tests on ROCm · 13c8d152
  Jeff Daily authored Jan 18, 2021
  
  13c8d152
- missing #include <c10/cuda/CUDAGuard.h> · 4ebf2b90
  Jeff Daily authored Jan 18, 2021
  
  4ebf2b90
- update setup.py to more closely align with upstream · 2332c4d6
  Jeff Daily authored Jan 18, 2021
```
Mostly whitespace or formatting issues addressed.
Diff with upstream is reduced; ROCm changes are more clear.
```
  2332c4d6
- Merge remote-tracking branch 'upstream/master' · dcc7b513
  Jeff Daily authored Jan 18, 2021
```
Conflicts:
csrc/multi_tensor_apply.cuh
setup.py
tests/L0/run_optimizers/test_adagrad.py
tests/L0/run_optimizers/test_fused_optimizer.py
tests/L0/run_optimizers/test_lamb.py
```
  dcc7b513
- Merge pull request #42 from sarunyap/reduce-block-fix · d061bf20
  Jeff Daily authored Jan 18, 2021
```
Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm
```
  d061bf20
15 Jan, 2021 1 commit
- Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm · ff232fb8
  Sarunya Pumma authored Nov 28, 2020
  
  ff232fb8
31 Dec, 2020 3 commits
- Merge pull request #41 from lcskrishna/cl/skip-tests · 76e4e054
  Chaitanya Sri Krishna Lolla authored Dec 31, 2020
```
Skip the unit tests
```
  76e4e054
- missing import statement · 41bbf93c
  lcskrishna authored Dec 31, 2020
  
  41bbf93c
- skip the unit tests · 5bae299e
  lcskrishna authored Dec 31, 2020
  
  5bae299e
17 Dec, 2020 3 commits
- Merge pull request #1015 from jpool-nv/patch-1 · 154c6336
  Thor Johnsen authored Dec 17, 2020
```
Update ASP README to highlight default recipe
```
  154c6336
- Update ASP README to highlight default recipe · 56914d4f
  jpool-nv authored Dec 17, 2020
```
The Recipe was presented after some non-standard API calls, so moving the suggested usage up, giving it its own section, and reinforcing the suggested usage in the non-standard section.
```
  56914d4f
- Merge pull request #38 from lcskrishna/cl/rocm-hipify-revamp · 663d5a4d
  Chaitanya Sri Krishna Lolla authored Dec 16, 2020
```
Hipify revamp changes for apex extensions on ROCm.
```
  663d5a4d
16 Dec, 2020 1 commit
- update readme and minor changes · 3fdb8db9
  lcskrishna authored Dec 16, 2020
  
  3fdb8db9
15 Dec, 2020 4 commits
- fixed spelling mistakes · 8efd60b2
  lcskrishna authored Dec 15, 2020
  
  8efd60b2
- update readme and add a note about deprecating old hipification process · 3b917de4
  lcskrishna authored Dec 14, 2020
  
  3b917de4
- fix compile args for multi-tensor extension · f4ad42c1
  lcskrishna authored Dec 14, 2020
  
  f4ad42c1
- refactor based on latest hipify revamp · 91003340
  lcskrishna authored Dec 14, 2020
  
  91003340
10 Dec, 2020 1 commit
- cleanup of extensions · 539bad24
  lcskrishna authored Dec 10, 2020
  
  539bad24
09 Dec, 2020 2 commits
- updated hipify changes for apex contrib · 9b4c68c7
  lcskrishna authored Dec 08, 2020
  
  9b4c68c7
- update setup file for rocm due to newer hipify changes · ef209a74
  lcskrishna authored Dec 08, 2020
  
  ef209a74
04 Dec, 2020 3 commits
- remove noise pip-version-check noise that hides the outcome of the build (#998) · 8cf5ae61
  Stas Bekman authored Dec 04, 2020
  
  8cf5ae61
- Distributed LAMB fixes (#1007) · 8a80d478
  Kexin Yu authored Dec 03, 2020
```
* add flag for DistributedAdam: step_support_amp_scaling
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>
```
  8a80d478
- Seryilmaz/fused dropout softmax (#985) · 3fe10b55
  Burc Eryilmaz authored Dec 03, 2020
```
* fuse dropout into softmax in fprop for additive mask case
```
  3fe10b55
02 Dec, 2020 1 commit

Fix lack of proper loading of best_prec1 from the checkpoint (#1000) · 6c186b3b

Janusz Lisiecki authored Dec 02, 2020



- resume() is a nested function and when it loads best_prec1
  it creates a local variable that hides the one from the parent
  function (which refers to the global one). This PR adds `global`
  to modify the global variable as intended
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

6c186b3b

01 Dec, 2020 1 commit

DistributedFusedAdam Model Parallelism Support (Megatron) (#981) · 6b7e77b0

Kexin Yu authored Dec 01, 2020



DistributedFusedAdam Model Parallelism Support (Megatron)
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>

6b7e77b0

04 Nov, 2020 1 commit

Fix LayerNorm op on ROCm (#36) · 7eed38aa

Ashish Farmer authored Nov 04, 2020

* fix warp size in WARP_SHFL* in layernorm

* enable fused_layer_norm tests on ROCm

7eed38aa

19 Oct, 2020 1 commit

Optimize the sync batchnorm by batching the communication (#980) · 8a1ed9e8

lly-zero-one authored Oct 19, 2020

In this PR, we mainly tried to optimize the performance of Syncatchnorm and also fixed one potential issue in the welford_parallel kernel implementation.

For performance improvement, we batched the mean/var/count all_gather communication together and sent it once in the forward path
We also batch the all_reduce in backward path
We add the contiguous call on the input of welford_parallel kernel.
If there is any standard perf benchmark, I would be happy to run it.

8a1ed9e8

29 Sep, 2020 1 commit
- use reshape instead of view (#971) · a109f856
  ptrblck authored Sep 28, 2020
  
  a109f856
15 Sep, 2020 1 commit
- Merge pull request #959 from a-maci/update-ASP-readme · 4a1fa2c4
  Thor Johnsen authored Sep 15, 2020
```
Update asp readme
```
  4a1fa2c4
14 Sep, 2020 2 commits
- Merge pull request #5 from a-maci/a-maci-patch-update-asp-readme · 48fc613d
  Asit authored Sep 14, 2020
```
Update README for ASP
```
  48fc613d
- Update README for ASP · e3794f42
  Asit authored Sep 14, 2020
```
Added an outline to illustrate our recommended recipe to obtain a pruned model
```
  e3794f42
21 Aug, 2020 1 commit
- update readme with ninja build instruction and pip3.6 install (#35) · e9c43d67
  Chaitanya Sri Krishna Lolla authored Aug 21, 2020
  
  e9c43d67
18 Aug, 2020 1 commit

[contrib] Support for xentropy extension. (#34) · 3344233f

Chaitanya Sri Krishna Lolla authored Aug 18, 2020

* enable deprecated fused adam optimizer

* enable deprecated fused lamb

* enable xentropy extension

* add warpsize 32 for nv and 64 for amd

* update compiler arguments

* update the syncwarp conditions

* update syncwarp condition

3344233f

17 Aug, 2020 1 commit

[contrib] Support optimizers on rocm. (#33) · 17fbbf91

Chaitanya Sri Krishna Lolla authored Aug 17, 2020

* enable deprecated fused adam optimizer

* enable deprecated fused lamb

* reset the compiler arguments

* syntax error

* aligning the compiler arguments

17fbbf91