1. 28 May, 2020 1 commit
  2. 27 May, 2020 1 commit
    • Kevin Stephano's avatar
      Update Softmax in multihead attention to use the Current Cuda Stream instead... · 5cb187f3
      Kevin Stephano authored
      
      Update Softmax in multihead attention to use the Current Cuda Stream instead of the Default Cuda Stream. (#843)
      
      * Adding C++ Multihead Attention implementation to contrib.
      
      * Add reference test that at least works for forward.
      
      * Remove CublasLt support from multihead attention.
      
      * Add new Python version of self attention.
      
      * Update python model of MHA with backward pass.
      
      * Fixed Output Linear connection in MHA.
      
      * Clean up compiles and add documentation to PySelfAttention.
      
      * Add Encdec Python version of multihead attention.  Cleanup files.
      
      * Tests for self and encdec multihead attention.
      
      * Add reference pytorch implementation of attention with norm and add.
      
      * Add cutlass branch definition.
      
      * Add cutlass download to compile.
      
      * Add norm/add tests.
      
      * Add biases to pytorch python versions.
      
      * Add tests and fix issues with python version of attention masking.
      
      * Create README.md
      
      * Update README.md
      
      * Update README.md
      
      * Update perf test parameters.
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Add files via upload
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Fix matmul1 output tensor size.  Fix tests that missed issue.
      
      * Allow for Z dimensions of 64K and greater on batched GEMMs.
      
      * remove redundant imports
      
      * general cleanup, remove deprecated or unused functions
      
      * Update Multihead Attention's softmax to use the Current Stream instead of the default stream.
      
      * Fix setup.py that got messed up in merge with upstream.
      
      * Update Multihead Attention strided batched gemms to use the current stream instead of the default.
      Co-authored-by: default avatarpbialecki <pbialecki@nvidia.com>
      5cb187f3
  3. 22 May, 2020 2 commits
  4. 19 May, 2020 1 commit
  5. 14 May, 2020 1 commit
  6. 13 May, 2020 1 commit
  7. 12 May, 2020 2 commits
  8. 08 May, 2020 1 commit
  9. 07 May, 2020 2 commits
  10. 06 May, 2020 3 commits
  11. 05 May, 2020 1 commit
  12. 04 May, 2020 1 commit
  13. 02 May, 2020 3 commits
  14. 01 May, 2020 4 commits
  15. 30 Apr, 2020 6 commits
  16. 29 Apr, 2020 5 commits
  17. 28 Apr, 2020 1 commit
  18. 23 Apr, 2020 1 commit
  19. 22 Apr, 2020 2 commits
    • Deyu Fu's avatar
    • Vinicius Reis's avatar
      Fix LARC with mixed precision (#793) · 2ec84ebd
      Vinicius Reis authored
      The LARC optimizer wraps an underlying optimizer and then needs to be passed
      to amp.initialize for mixed precision. There were 3 different crashes happening
      in this situation, fix all of them and add a unit test.
      
      I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
      entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
      defined seems more reliable though.
      2ec84ebd
  20. 20 Apr, 2020 1 commit