1. 18 Mar, 2022 1 commit
  2. 09 Dec, 2021 2 commits
  3. 06 Dec, 2021 1 commit
    • Masaki Kozuki's avatar
      remove THC headers/functions (#1192) · 2155dabf
      Masaki Kozuki authored
      Changes include
      - THC headers removal
      - TH macros replacement
      - fix some typo in comment
       Conflicts:
      	apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.cu
      	apex/contrib/csrc/multihead_attn/encdec_multihead_attn_cuda.cu
      	apex/contrib/csrc/multihead_attn/encdec_multihead_attn_norm_add_cuda.cu
      	apex/contrib/csrc/multihead_attn/masked_softmax_dropout_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_additive_mask_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_norm_add_cuda.cu
      	apex/contrib/csrc/multihead_attn/strided_batched_gemm.h
      2155dabf
  4. 19 Oct, 2021 1 commit
  5. 18 Oct, 2021 1 commit
  6. 16 Oct, 2021 1 commit
  7. 27 May, 2020 1 commit
    • Kevin Stephano's avatar
      Update Softmax in multihead attention to use the Current Cuda Stream instead... · 5cb187f3
      Kevin Stephano authored
      
      Update Softmax in multihead attention to use the Current Cuda Stream instead of the Default Cuda Stream. (#843)
      
      * Adding C++ Multihead Attention implementation to contrib.
      
      * Add reference test that at least works for forward.
      
      * Remove CublasLt support from multihead attention.
      
      * Add new Python version of self attention.
      
      * Update python model of MHA with backward pass.
      
      * Fixed Output Linear connection in MHA.
      
      * Clean up compiles and add documentation to PySelfAttention.
      
      * Add Encdec Python version of multihead attention.  Cleanup files.
      
      * Tests for self and encdec multihead attention.
      
      * Add reference pytorch implementation of attention with norm and add.
      
      * Add cutlass branch definition.
      
      * Add cutlass download to compile.
      
      * Add norm/add tests.
      
      * Add biases to pytorch python versions.
      
      * Add tests and fix issues with python version of attention masking.
      
      * Create README.md
      
      * Update README.md
      
      * Update README.md
      
      * Update perf test parameters.
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Add files via upload
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Fix matmul1 output tensor size.  Fix tests that missed issue.
      
      * Allow for Z dimensions of 64K and greater on batched GEMMs.
      
      * remove redundant imports
      
      * general cleanup, remove deprecated or unused functions
      
      * Update Multihead Attention's softmax to use the Current Stream instead of the default stream.
      
      * Fix setup.py that got messed up in merge with upstream.
      
      * Update Multihead Attention strided batched gemms to use the current stream instead of the default.
      Co-authored-by: default avatarpbialecki <pbialecki@nvidia.com>
      5cb187f3
  8. 11 Mar, 2020 1 commit
  9. 24 Feb, 2020 1 commit
    • Kevin Stephano's avatar
      Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a
      Kevin Stephano authored
      * Adding C++ Multihead Attention implementation to contrib.
      
      * Add reference test that at least works for forward.
      
      * Remove CublasLt support from multihead attention.
      
      * Add new Python version of self attention.
      
      * Update python model of MHA with backward pass.
      
      * Fixed Output Linear connection in MHA.
      
      * Clean up compiles and add documentation to PySelfAttention.
      
      * Add Encdec Python version of multihead attention.  Cleanup files.
      
      * Tests for self and encdec multihead attention.
      
      * Add reference pytorch implementation of attention with norm and add.
      
      * Add cutlass branch definition.
      
      * Add cutlass download to compile.
      
      * Add norm/add tests.
      
      * Add biases to pytorch python versions.
      
      * Add tests and fix issues with python version of attention masking.
      
      * Create README.md
      
      * Update README.md
      
      * Update README.md
      
      * Update perf test parameters.
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Add files via upload
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Fix matmul1 output tensor size.  Fix tests that missed issue.
      
      * Allow for Z dimensions of 64K and greater on batched GEMMs.
      
      * remove redundant imports
      
      * general cleanup, remove deprecated or unused functions
      1733946a
  10. 06 Feb, 2020 1 commit
    • Kevin Stephano's avatar
      Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e
      Kevin Stephano authored
      * Adding C++ Multihead Attention implementation to contrib.
      
      * Add reference test that at least works for forward.
      
      * Remove CublasLt support from multihead attention.
      
      * Add new Python version of self attention.
      
      * Update python model of MHA with backward pass.
      
      * Fixed Output Linear connection in MHA.
      
      * Clean up compiles and add documentation to PySelfAttention.
      
      * Add Encdec Python version of multihead attention.  Cleanup files.
      
      * Tests for self and encdec multihead attention.
      
      * Add reference pytorch implementation of attention with norm and add.
      
      * Add cutlass branch definition.
      
      * Add cutlass download to compile.
      
      * Add norm/add tests.
      
      * Add biases to pytorch python versions.
      
      * Add tests and fix issues with python version of attention masking.
      
      * Create README.md
      
      * Update README.md
      
      * Update README.md
      
      * Update perf test parameters.
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Add files via upload
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Fix matmul1 output tensor size.  Fix tests that missed issue.
      3f94528e