Commits · 37d8410c14447c3e72ee03882c767b5cd4e801ab · OpenDAS / apex

01 Sep, 2021 1 commit
- work around hipify not finding headers · 888e72ad
  Jeff Daily authored Sep 01, 2021
  
  888e72ad
31 Aug, 2021 1 commit
- enable --distributed_lamb for rocm · 955256d1
  Jeff Daily authored Aug 31, 2021
  
  955256d1
25 Jun, 2021 1 commit
- Make torch version check numeric · 799785ab
  Jithun Nair authored Jun 25, 2021
  
  799785ab
23 Feb, 2021 1 commit
- fast layer norm (#1037) · e2083df5
  yjk21 authored Feb 23, 2021
  
  e2083df5
21 Jan, 2021 1 commit
- fix cross-compiled ROCm builds when no GPUs detected (#45) · c1e88fae
  Jeff Daily authored Jan 21, 2021
  
  c1e88fae
18 Jan, 2021 1 commit

update setup.py to more closely align with upstream · 2332c4d6

Jeff Daily authored Jan 18, 2021

Mostly whitespace or formatting issues addressed.
Diff with upstream is reduced; ROCm changes are more clear.

2332c4d6

16 Dec, 2020 1 commit
- update readme and minor changes · 3fdb8db9
  lcskrishna authored Dec 16, 2020
  
  3fdb8db9
15 Dec, 2020 3 commits
- fixed spelling mistakes · 8efd60b2
  lcskrishna authored Dec 15, 2020
  
  8efd60b2
- fix compile args for multi-tensor extension · f4ad42c1
  lcskrishna authored Dec 14, 2020
  
  f4ad42c1
- refactor based on latest hipify revamp · 91003340
  lcskrishna authored Dec 14, 2020
  
  91003340
10 Dec, 2020 1 commit
- cleanup of extensions · 539bad24
  lcskrishna authored Dec 10, 2020
  
  539bad24
09 Dec, 2020 2 commits
- updated hipify changes for apex contrib · 9b4c68c7
  lcskrishna authored Dec 08, 2020
  
  9b4c68c7
- update setup file for rocm due to newer hipify changes · ef209a74
  lcskrishna authored Dec 08, 2020
  
  ef209a74
01 Dec, 2020 1 commit

DistributedFusedAdam Model Parallelism Support (Megatron) (#981) · 6b7e77b0

Kexin Yu authored Dec 01, 2020



DistributedFusedAdam Model Parallelism Support (Megatron)
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>

6b7e77b0

18 Aug, 2020 1 commit

[contrib] Support for xentropy extension. (#34) · 3344233f

Chaitanya Sri Krishna Lolla authored Aug 18, 2020

* enable deprecated fused adam optimizer

* enable deprecated fused lamb

* enable xentropy extension

* add warpsize 32 for nv and 64 for amd

* update compiler arguments

* update the syncwarp conditions

* update syncwarp condition

3344233f

17 Aug, 2020 1 commit

[contrib] Support optimizers on rocm. (#33) · 17fbbf91

Chaitanya Sri Krishna Lolla authored Aug 17, 2020

* enable deprecated fused adam optimizer

* enable deprecated fused lamb

* reset the compiler arguments

* syntax error

* aligning the compiler arguments

17fbbf91

10 Aug, 2020 1 commit
- move sm80 code inside MHA (#937) · 5d9b5cbc
  ptrblck authored Aug 10, 2020
```
Co-authored-by: pbialecki <pbialecki@nvidia.com>
```
  5d9b5cbc
05 Aug, 2020 1 commit

Enable mlp_cuda extension. (#28) · d2f6d04a

Chaitanya Sri Krishna Lolla authored Aug 05, 2020

* enable mlp cuda

* add setup changes and tests

* skip the unit tests

* updated conditions for empty array

* removed hip platform conditions

d2f6d04a

01 Aug, 2020 1 commit
- Add sm80 for CUDA >= 11 (#925) · 5b53121a
  ptrblck authored Aug 01, 2020
  
  5b53121a
10 Jul, 2020 1 commit

Enable sync batchnorm extension. (#27) · 9c80f6d3

Chaitanya Sri Krishna Lolla authored Jul 10, 2020

* Enable sync batchnorm

* enable syncbn properly

* update the unit tests

* update tests

* update conditions for welford_merge_element

* updated conditions based on comments.

9c80f6d3

01 Jun, 2020 1 commit
- Add Pyprof removal warnings that point to new repo (#862) · 097238f8
  mcarilli authored Jun 01, 2020
```
Co-authored-by: Michael Carilli <mcarilli@nvidia.com>
```
  097238f8
30 May, 2020 2 commits
- Make separate apex option for distributed lamb · 45388d48
  Thor Johnsen authored May 30, 2020
  
  45388d48
- Distributed LAMB optimizer · 19892f1d
  Thor Johnsen authored May 30, 2020
  
  19892f1d
29 May, 2020 1 commit

Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add... · 6c2babf9

Burc Eryilmaz authored May 29, 2020


Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add additive mask support, separate Q/K/V parameters (#854)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>

6c2babf9

20 May, 2020 1 commit
- Fix compile args, adding version_dependent_macros. (#12) · 267e696d
  Jeff Daily authored May 20, 2020
  
  267e696d
18 May, 2020 1 commit
- enable multi tensor apply fusedadagrad (#9) · 65490af6
  Chaitanya Sri Krishna Lolla authored May 18, 2020
  
  65490af6
14 May, 2020 1 commit
- Add FusedAdagrad (#822) · 3bae8c83
  Andrew Tulloch authored May 14, 2020
  
  3bae8c83
07 May, 2020 1 commit
- Enable fusedlayernorm extension (#3) · 2d0f9cf2
  Chaitanya Sri Krishna Lolla authored May 07, 2020
  
  2d0f9cf2
28 Apr, 2020 1 commit

Enable Apex on ROCm and support multi tensor support. (#1) · 8124df13

Chaitanya Sri Krishna Lolla authored Apr 28, 2020

* Initial commit to hipify all cuda code

* enable multi_tensor_apply extension

* added generatedFileCleaner to handle nested hip files

8124df13

23 Apr, 2020 1 commit

CUDAGenerator fix for #36026 (#801) · 1f2aa915

ptrblck authored Apr 22, 2020



* add CUDAGenerator guard

* fix generator_flag

* add guards for gen pointer/ref issue

* change mutex_ to mutex()

* add check_generator
Co-authored-by: pbialecki <pbialecki@nvidia.com>

1f2aa915

22 Apr, 2020 1 commit
- initial commit to add Multilayer Perceptron (MLP) extension (#790) · 71511faf
  Deyu Fu authored Apr 22, 2020
  
  71511faf
23 Mar, 2020 1 commit
- add l2norm source for FusedLAMB · a3ffb8a7
  Kexin Yu authored Mar 23, 2020
  
  a3ffb8a7
20 Mar, 2020 2 commits
- extension name fix · b4c32010
  Kexin Yu authored Mar 20, 2020
  
  b4c32010
- apex.contrib.optimizers.FuseLamb first commit · b222ed2b
  Kexin Yu authored Mar 19, 2020
  
  b222ed2b
11 Mar, 2020 1 commit

Fix deprecated calls in multihead_attn and ninja build failure (#746) · 80b90b9d

ptrblck authored Mar 11, 2020



* disable ninja for multihead_attn

* fix getCurrentStream in multihead_attn
Co-authored-by: pbialecki <pbialecki@nvidia.com>

80b90b9d

02 Mar, 2020 1 commit
- Revert "remove gencode from multihead_attn build (#731)" · 5633f6db
  pbialecki authored Mar 01, 2020
```
This reverts commit 92b3b9a9.
```
  5633f6db
27 Feb, 2020 1 commit
- NHWC support for multi tensor apply (#732) · de6378f5
  mcarilli authored Feb 26, 2020
```
* NHWC support for multi tensor apply

* compilation fix for version<=1.4
```
  de6378f5
25 Feb, 2020 2 commits
- remove gencode from multihead_attn build (#731) · 92b3b9a9
  ptrblck authored Feb 25, 2020
  
  92b3b9a9
- remove duplicated multihead_attn install (#729) · 5f6b9b0e
  ptrblck authored Feb 24, 2020
  
  5f6b9b0e
24 Feb, 2020 1 commit

Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a

Kevin Stephano authored Feb 24, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

1733946a