Commits · 02ada95dd72753694d1423012d8e0e5cd3f23737 · OpenDAS / apex

You need to sign in or sign up before continuing.

31 Aug, 2021 2 commits
- Merge pull request #52 from ROCmSoftwarePlatform/add_distributed_fused_lamb · 02ada95d
  Jithun Nair authored Aug 31, 2021
```
add distributed fused lamb
```
  02ada95d
- enable --distributed_lamb for rocm · 955256d1
  Jeff Daily authored Aug 31, 2021
  
  955256d1
25 Jun, 2021 2 commits
- Merge pull request #50 from ROCmSoftwarePlatform/numeric_torch_version_check · 95797c8d
  Jeff Daily authored Jun 25, 2021
```
Make torch version check numeric
```
  95797c8d
- Make torch version check numeric · 799785ab
  Jithun Nair authored Jun 25, 2021
  
  799785ab
04 Mar, 2021 3 commits
- Merge pull request #48 from ROCmSoftwarePlatform/IFU-2020-03-04 · 107f1ff5
  Jeff Daily authored Mar 04, 2021
```
IFU-2020-03-04
```
  107f1ff5
- Merge remote-tracking branch 'upstream/master' into IFU-2020-03-04 · c285a67c
  Jeff Daily authored Mar 04, 2021
  
  c285a67c
- Merge pull request #47 from ROCmSoftwarePlatform/revert_workaround · dde39c9f
  Peng authored Mar 04, 2021
```
Revert "pass all TensorListMetadata as pointer to pinned host memory (#13)
```
  dde39c9f
25 Feb, 2021 1 commit
- Revert "pass all TensorListMetadata as pointer to pinned host memory (#13)" · fbb8cd93
  Jeff Daily authored Feb 25, 2021
```
This reverts commit bdd481d1.
```
  fbb8cd93
23 Feb, 2021 1 commit
- fast layer norm (#1037) · e2083df5
  yjk21 authored Feb 23, 2021
  
  e2083df5
10 Feb, 2021 1 commit

fix import container_abcs issue (#1049) · a78ccf0b

Shoufa Chen authored Feb 10, 2021

* copy-paste friendly

* fix import container_abcs issue

Nightly PyTorch has removed `container_abcs` from `torch._six`.  https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35

* fix import container_abcs issue

Nightly PyTorch has removed `container_abcs` from `torch._six`.
https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35

* keep existing for pytorch1.7 and earlier

a78ccf0b

25 Jan, 2021 1 commit

fix bugs in syncbn (#46) · 3f49dbf0

Jeff Daily authored Jan 25, 2021

- incorrect use of __shfl_down
- fix warp size assumptions
- update unit tests to exit on failure

3f49dbf0

21 Jan, 2021 2 commits
- fix cross-compiled ROCm builds when no GPUs detected (#45) · c1e88fae
  Jeff Daily authored Jan 21, 2021
  
  c1e88fae
- use __launch_bounds__ for multi_tensor_apply (#44) · 5baa68d3
  Jeff Daily authored Jan 21, 2021
```
use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
```
  5baa68d3
20 Jan, 2021 1 commit
- cuda rng changes for graph capture with apex MHA (#1025) · eefb1ba2
  Burc Eryilmaz authored Jan 20, 2021
```
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
```
  eefb1ba2
19 Jan, 2021 1 commit
- Merge pull request #43 from ROCmSoftwarePlatform/IFU-2021-01-18 · 85b56d01
  Jeff Daily authored Jan 19, 2021
```
IFU-2021-01-18
```
  85b56d01
18 Jan, 2021 5 commits
- skip failing tests on ROCm · 13c8d152
  Jeff Daily authored Jan 18, 2021
  
  13c8d152
- missing #include <c10/cuda/CUDAGuard.h> · 4ebf2b90
  Jeff Daily authored Jan 18, 2021
  
  4ebf2b90
- update setup.py to more closely align with upstream · 2332c4d6
  Jeff Daily authored Jan 18, 2021
```
Mostly whitespace or formatting issues addressed.
Diff with upstream is reduced; ROCm changes are more clear.
```
  2332c4d6
- Merge remote-tracking branch 'upstream/master' · dcc7b513
  Jeff Daily authored Jan 18, 2021
```
Conflicts:
csrc/multi_tensor_apply.cuh
setup.py
tests/L0/run_optimizers/test_adagrad.py
tests/L0/run_optimizers/test_fused_optimizer.py
tests/L0/run_optimizers/test_lamb.py
```
  dcc7b513
- Merge pull request #42 from sarunyap/reduce-block-fix · d061bf20
  Jeff Daily authored Jan 18, 2021
```
Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm
```
  d061bf20
15 Jan, 2021 1 commit
- Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm · ff232fb8
  Sarunya Pumma authored Nov 28, 2020
  
  ff232fb8
31 Dec, 2020 3 commits
- Merge pull request #41 from lcskrishna/cl/skip-tests · 76e4e054
  Chaitanya Sri Krishna Lolla authored Dec 31, 2020
```
Skip the unit tests
```
  76e4e054
- missing import statement · 41bbf93c
  lcskrishna authored Dec 31, 2020
  
  41bbf93c
- skip the unit tests · 5bae299e
  lcskrishna authored Dec 31, 2020
  
  5bae299e
17 Dec, 2020 3 commits
- Merge pull request #1015 from jpool-nv/patch-1 · 154c6336
  Thor Johnsen authored Dec 17, 2020
```
Update ASP README to highlight default recipe
```
  154c6336
- Update ASP README to highlight default recipe · 56914d4f
  jpool-nv authored Dec 17, 2020
```
The Recipe was presented after some non-standard API calls, so moving the suggested usage up, giving it its own section, and reinforcing the suggested usage in the non-standard section.
```
  56914d4f
- Merge pull request #38 from lcskrishna/cl/rocm-hipify-revamp · 663d5a4d
  Chaitanya Sri Krishna Lolla authored Dec 16, 2020
```
Hipify revamp changes for apex extensions on ROCm.
```
  663d5a4d
16 Dec, 2020 1 commit
- update readme and minor changes · 3fdb8db9
  lcskrishna authored Dec 16, 2020
  
  3fdb8db9
15 Dec, 2020 4 commits
- fixed spelling mistakes · 8efd60b2
  lcskrishna authored Dec 15, 2020
  
  8efd60b2
- update readme and add a note about deprecating old hipification process · 3b917de4
  lcskrishna authored Dec 14, 2020
  
  3b917de4
- fix compile args for multi-tensor extension · f4ad42c1
  lcskrishna authored Dec 14, 2020
  
  f4ad42c1
- refactor based on latest hipify revamp · 91003340
  lcskrishna authored Dec 14, 2020
  
  91003340
10 Dec, 2020 1 commit
- cleanup of extensions · 539bad24
  lcskrishna authored Dec 10, 2020
  
  539bad24
09 Dec, 2020 2 commits
- updated hipify changes for apex contrib · 9b4c68c7
  lcskrishna authored Dec 08, 2020
  
  9b4c68c7
- update setup file for rocm due to newer hipify changes · ef209a74
  lcskrishna authored Dec 08, 2020
  
  ef209a74
04 Dec, 2020 3 commits
- remove noise pip-version-check noise that hides the outcome of the build (#998) · 8cf5ae61
  Stas Bekman authored Dec 04, 2020
  
  8cf5ae61
- Distributed LAMB fixes (#1007) · 8a80d478
  Kexin Yu authored Dec 03, 2020
```
* add flag for DistributedAdam: step_support_amp_scaling
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>
```
  8a80d478
- Seryilmaz/fused dropout softmax (#985) · 3fe10b55
  Burc Eryilmaz authored Dec 03, 2020
```
* fuse dropout into softmax in fprop for additive mask case
```
  3fe10b55
02 Dec, 2020 1 commit

Fix lack of proper loading of best_prec1 from the checkpoint (#1000) · 6c186b3b

Janusz Lisiecki authored Dec 02, 2020



- resume() is a nested function and when it loads best_prec1
  it creates a local variable that hides the one from the parent
  function (which refers to the global one). This PR adds `global`
  to modify the global variable as intended
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

6c186b3b

01 Dec, 2020 1 commit

DistributedFusedAdam Model Parallelism Support (Megatron) (#981) · 6b7e77b0

Kexin Yu authored Dec 01, 2020



DistributedFusedAdam Model Parallelism Support (Megatron)
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>

6b7e77b0