Commits · b6a1f48b9ddaa02653585531ce65dc6b3e020a43 · OpenDAS / apex

18 Mar, 2022 1 commit

Add rocblas_alt_impl falg for bwd rocblas calls in MHA (#70) · b6a1f48b

athitten authored Mar 18, 2022



* Add missing flags arg in gemm_switch_fp32accum call

* Add rocblas_alt_impl flag in MHA

<rev> Add rocblas_alt_impl flag for all bwd gemms in MHA module

* Use ifdef for rocblas_gemm_flags_fp16_alt_impl to target at various AMD hardware
Co-authored-by: hubertlu-tw <hubertlu@amd.com>

b6a1f48b

09 Dec, 2021 2 commits
- Fix some bugs related to THCState and cutlass · cf0b0f01
  Hubert Lu authored Dec 09, 2021
  
  cf0b0f01
- Remove `THCState` from `apex/contrib/multihead_attn` (#1239) · 9615983e
  Masaki Kozuki authored Dec 09, 2021
```
* pass `self.mask_additive`

* clang-format

* removing THCState
```
  9615983e
06 Dec, 2021 1 commit

remove THC headers/functions (#1192) · 2155dabf

Masaki Kozuki authored Oct 19, 2021

Changes include
- THC headers removal
- TH macros replacement
- fix some typo in comment
 Conflicts:
	apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.cu
	apex/contrib/csrc/multihead_attn/encdec_multihead_attn_cuda.cu
	apex/contrib/csrc/multihead_attn/encdec_multihead_attn_norm_add_cuda.cu
	apex/contrib/csrc/multihead_attn/masked_softmax_dropout_cuda.cu
	apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_additive_mask_cuda.cu
	apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_cuda.cu
	apex/contrib/csrc/multihead_attn/self_multihead_attn_cuda.cu
	apex/contrib/csrc/multihead_attn/self_multihead_attn_norm_add_cuda.cu
	apex/contrib/csrc/multihead_attn/strided_batched_gemm.h

2155dabf

19 Oct, 2021 1 commit
- Enable the following modules in apex/contrib: · 1fd257e2
  Abhishree authored Oct 19, 2021
```
1) multihead_attn
2) xentropy
3) fused_adam and distributed_fused_adam
```
  1fd257e2
18 Oct, 2021 1 commit

remove THC headers/functions (#1192) · 0c7d8e3f

Masaki Kozuki authored Oct 19, 2021

Changes include
- THC headers removal
- TH macros replacement
- fix some typo in comment

0c7d8e3f

16 Oct, 2021 1 commit
- replace (#1191) · 60821f53
  Masaki Kozuki authored Oct 16, 2021
  
  60821f53
27 May, 2020 1 commit

Update Softmax in multihead attention to use the Current Cuda Stream instead... · 5cb187f3

Kevin Stephano authored May 26, 2020

Update Softmax in multihead attention to use the Current Cuda Stream instead of the Default Cuda Stream. (#843)

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention. Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Add files via upload

* Update README.md

* Fix matmul1 output tensor size. Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

* Update Multihead Attention's softmax to use the Current Stream instead of the default stream.

* Fix setup.py that got messed up in merge with upstream.

* Update Multihead Attention strided batched gemms to use the current stream instead of the default.
Co-authored-by: pbialecki <pbialecki@nvidia.com>

5cb187f3

11 Mar, 2020 1 commit

Fix deprecated calls in multihead_attn and ninja build failure (#746) · 80b90b9d

ptrblck authored Mar 11, 2020



* disable ninja for multihead_attn

* fix getCurrentStream in multihead_attn
Co-authored-by: pbialecki <pbialecki@nvidia.com>

80b90b9d

24 Feb, 2020 1 commit

Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a

Kevin Stephano authored Feb 24, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

1733946a

06 Feb, 2020 1 commit

Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e

Kevin Stephano authored Feb 06, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

3f94528e