- 31 Mar, 2020 1 commit
-
-
Jeff Bowles authored
-
- 25 Mar, 2020 1 commit
-
-
msbaines authored
The cuda kernel used by fused-adam was using the default stream on the default device. The kernel needs use the same device as the parameter tensor. Fixed by using context manager to set correct default device. For the use_mt case, raised an error. Alternatively, the use_mt case could launch one kernel per cuda device. The non-contrib version will also need to be fixed. Co-authored-by:Mandeep Singh Baines <msb@fb.com>
-
- 11 Mar, 2020 2 commits
-
-
ptrblck authored
* disable ninja for multihead_attn * fix getCurrentStream in multihead_attn Co-authored-by:pbialecki <pbialecki@nvidia.com>
-
Tomasz Grel authored
* Do not unscale the gradients if loss scale equal to 1 * Disable unscaling loss scale == 1 only for static scaling
-
- 02 Mar, 2020 1 commit
-
- 27 Feb, 2020 1 commit
-
-
mcarilli authored
* NHWC support for multi tensor apply * compilation fix for version<=1.4
-
- 25 Feb, 2020 3 commits
-
-
ptrblck authored
-
ptrblck authored
-
Saransh Karira authored
-
- 24 Feb, 2020 1 commit
-
-
Kevin Stephano authored
* Adding C++ Multihead Attention implementation to contrib. * Add reference test that at least works for forward. * Remove CublasLt support from multihead attention. * Add new Python version of self attention. * Update python model of MHA with backward pass. * Fixed Output Linear connection in MHA. * Clean up compiles and add documentation to PySelfAttention. * Add Encdec Python version of multihead attention. Cleanup files. * Tests for self and encdec multihead attention. * Add reference pytorch implementation of attention with norm and add. * Add cutlass branch definition. * Add cutlass download to compile. * Add norm/add tests. * Add biases to pytorch python versions. * Add tests and fix issues with python version of attention masking. * Create README.md * Update README.md * Update README.md * Update perf test parameters. * Update README.md * Update README.md * Update README.md * Add files via upload * Update README.md * Update README.md * Update README.md * Fix matmul1 output tensor size. Fix tests that missed issue. * Allow for Z dimensions of 64K and greater on batched GEMMs. * remove redundant imports * general cleanup, remove deprecated or unused functions
-
- 15 Feb, 2020 1 commit
-
-
Deyu Fu authored
-
- 10 Feb, 2020 1 commit
-
-
Ayla Khan authored
Actual flag is --opt_level and copy pasting the example results in an unrecognized arguments error.
-
- 06 Feb, 2020 1 commit
-
-
Kevin Stephano authored
* Adding C++ Multihead Attention implementation to contrib. * Add reference test that at least works for forward. * Remove CublasLt support from multihead attention. * Add new Python version of self attention. * Update python model of MHA with backward pass. * Fixed Output Linear connection in MHA. * Clean up compiles and add documentation to PySelfAttention. * Add Encdec Python version of multihead attention. Cleanup files. * Tests for self and encdec multihead attention. * Add reference pytorch implementation of attention with norm and add. * Add cutlass branch definition. * Add cutlass download to compile. * Add norm/add tests. * Add biases to pytorch python versions. * Add tests and fix issues with python version of attention masking. * Create README.md * Update README.md * Update README.md * Update perf test parameters. * Update README.md * Update README.md * Update README.md * Add f...
-
- 05 Feb, 2020 1 commit
-
-
Kexin Yu authored
* updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD * fix attribute name mismatch in state_dict() and load_state_dict()
-
- 27 Jan, 2020 1 commit
-
-
Vitaly Fedyunin authored
-
- 21 Jan, 2020 1 commit
-
-
jjsjann123 authored
-
- 08 Jan, 2020 1 commit
-
-
ptrblck authored
* add WAR for pip>=19.3.1 * remove pipmain, use extras_require instead
-
- 18 Dec, 2019 2 commits
- 05 Dec, 2019 1 commit
-
-
Neil Tenenholtz authored
-
- 03 Dec, 2019 1 commit
-
-
Michael Carilli authored
-
- 22 Nov, 2019 1 commit
-
-
Roshan Rao authored
-
- 06 Nov, 2019 1 commit
-
-
jjsjann123 authored
-
- 30 Oct, 2019 1 commit
-
-
mcarilli authored
-
- 23 Oct, 2019 1 commit
-
-
Bram Vanroy authored
-
- 22 Oct, 2019 1 commit
-
-
Michael Carilli authored
-
- 19 Oct, 2019 1 commit
-
-
Aron Hoffmann authored
Made the patched optimizer step function a full method, not simply a function stored as an instance member (#553)
-
- 10 Oct, 2019 2 commits
-
-
Michael Carilli authored
-
mcarilli authored
-
- 09 Oct, 2019 2 commits
-
-
Bram Vanroy authored
-
Marek Kolodziej authored
-
- 08 Oct, 2019 1 commit
-
-
Jan Schlüter authored
-
- 04 Oct, 2019 1 commit
-
-
Deyu Fu authored
* move previous fused_adam and fp16_optimizer to contrib * make build contrib.fused_adam optional * change build option name * remove unnecessary try import
-
- 03 Oct, 2019 1 commit
-
-
ptrblck authored
* increase atol for Half-Float comparison to 1.5e-4 * disable tests for different opt_levels * reset atol * add bitwise accurate comparison
-
- 02 Oct, 2019 1 commit
-
- 13 Sep, 2019 1 commit
-
-
mcarilli authored
-
- 12 Sep, 2019 1 commit
-
-
Youngjin Kim authored
-
- 11 Sep, 2019 1 commit
-
-
jjsjann123 authored
-
- 06 Sep, 2019 2 commits