- 01 Jul, 2020 1 commit
-
-
Kirthi Sivamani authored
-
- 10 Jun, 2020 4 commits
-
-
Kirthi Sivamani authored
-
Kirthi Sivamani authored
-
Kirthi Sivamani authored
-
Kirthi Sivamani authored
-
- 01 Jun, 2020 3 commits
-
-
mcarilli authored
Co-authored-by:Michael Carilli <mcarilli@nvidia.com>
-
Thor Johnsen authored
Remove distributed lamb from __init__.py
-
Thor Johnsen authored
-
- 31 May, 2020 6 commits
-
-
Thor Johnsen authored
Distributed lamb optimizer
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 30 May, 2020 19 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 29 May, 2020 3 commits
-
-
Kevin Stephano authored
-
Burc Eryilmaz authored
Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add additive mask support, separate Q/K/V parameters (#854) Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
-
Kexin Yu authored
make FusedLAMB async
-
- 28 May, 2020 1 commit
-
-
Max V. Irgiznov authored
-
- 27 May, 2020 1 commit
-
-
Kevin Stephano authored
Update Softmax in multihead attention to use the Current Cuda Stream instead of the Default Cuda Stream. (#843) * Adding C++ Multihead Attention implementation to contrib. * Add reference test that at least works for forward. * Remove CublasLt support from multihead attention. * Add new Python version of self attention. * Update python model of MHA with backward pass. * Fixed Output Linear connection in MHA. * Clean up compiles and add documentation to PySelfAttention. * Add Encdec Python version of multihead attention. Cleanup files. * Tests for self and encdec multihead attention. * Add reference pytorch implementation of attention with norm and add. * Add cutlass branch definition. * Add cutlass download to compile. * Add norm/add tests. * Add biases to pytorch python versions. * Add tests and fix issues with python version of attention masking. * Create README.md * Update README.md * Update README.md * Update perf test parameters. * Update README.md * Update README.md * Update README.md * Add files via upload * Update README.md * Update README.md * Update README.md * Fix matmul1 output tensor size. Fix tests that missed issue. * Allow for Z dimensions of 64K and greater on batched GEMMs. * remove redundant imports * general cleanup, remove deprecated or unused functions * Update Multihead Attention's softmax to use the Current Stream instead of the default stream. * Fix setup.py that got messed up in merge with upstream. * Update Multihead Attention strided batched gemms to use the current stream instead of the default. Co-authored-by:pbialecki <pbialecki@nvidia.com>
-
- 23 May, 2020 1 commit
-
-
Kexin Yu authored
-
- 22 May, 2020 1 commit
-
-
Kexin Yu authored
-