Commits · cc92a4b47dc45a6badb384ce2c68e43940e380fa · OpenDAS / apex

17 Apr, 2021 1 commit

Adding fast bottleneck implementation into contrib (#1079) · 705cba91

Deyu Fu authored Apr 17, 2021



* initial commit for adding fast bottleneck

* sync cudnn-frontend module
Co-authored-by: pbialecki <pbialecki@nvidia.com>

705cba91

06 Feb, 2020 1 commit

Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e

Kevin Stephano authored Feb 06, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

3f94528e