- 19 Mar, 2020 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 18 Mar, 2020 2 commits
-
-
Thor Johnsen authored
Merge branch 'revertable_fused_adam_with_mt_support' of https://github.com/NVIDIA/apex into revertable_fused_adam_with_mt_support .
-
Thor Johnsen authored
-
- 17 Mar, 2020 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 13 Mar, 2020 14 commits
-
-
Thor Johnsen authored
Merge branch 'revertable_fused_adam_with_mt_support' of https://github.com/NVIDIA/apex into revertable_fused_adam_with_mt_support Rebasing
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 12 Mar, 2020 11 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 11 Mar, 2020 2 commits
-
-
ptrblck authored
* disable ninja for multihead_attn * fix getCurrentStream in multihead_attn Co-authored-by:pbialecki <pbialecki@nvidia.com>
-
Tomasz Grel authored
* Do not unscale the gradients if loss scale equal to 1 * Disable unscaling loss scale == 1 only for static scaling
-
- 02 Mar, 2020 1 commit
-
- 27 Feb, 2020 1 commit
-
-
mcarilli authored
* NHWC support for multi tensor apply * compilation fix for version<=1.4
-
- 25 Feb, 2020 3 commits
-
-
ptrblck authored
-
ptrblck authored
-
Saransh Karira authored
-
- 24 Feb, 2020 1 commit
-
-
Kevin Stephano authored
* Adding C++ Multihead Attention implementation to contrib. * Add reference test that at least works for forward. * Remove CublasLt support from multihead attention. * Add new Python version of self attention. * Update python model of MHA with backward pass. * Fixed Output Linear connection in MHA. * Clean up compiles and add documentation to PySelfAttention. * Add Encdec Python version of multihead attention. Cleanup files. * Tests for self and encdec multihead attention. * Add reference pytorch implementation of attention with norm and add. * Add cutlass branch definition. * Add cutlass download to compile. * Add norm/add tests. * Add biases to pytorch python versions. * Add tests and fix issues with python version of attention masking. * Create README.md * Update README.md * Update README.md * Update perf test parameters. * Update README.md * Update README.md * Update README.md * Add files via upload * Update README.md * Update README.md * Update README.md * Fix matmul1 output tensor size. Fix tests that missed issue. * Allow for Z dimensions of 64K and greater on batched GEMMs. * remove redundant imports * general cleanup, remove deprecated or unused functions
-