Commits · dde1374145f153ccfb5d6d98dd5404fd543d17f6 · OpenDAS / apex

19 Mar, 2020 3 commits
- Bug fix · dde13741
  Thor Johnsen authored Mar 19, 2020
  
  dde13741
- Avoid 1 copy for double buffering scheme · 029cd5e1
  Thor Johnsen authored Mar 19, 2020
  
  029cd5e1
- Add option to revert step through double buffering · b85ff391
  Thor Johnsen authored Mar 18, 2020
  
  b85ff391
18 Mar, 2020 2 commits
- Merge branch 'revertable_fused_adam_with_mt_support' of... · ffed6e80
  Thor Johnsen authored Mar 18, 2020
```
Merge branch 'revertable_fused_adam_with_mt_support' of https://github.com/NVIDIA/apex into revertable_fused_adam_with_mt_support

.
```
  ffed6e80
- Change inplace to no_copy · eca3a2c4
  Thor Johnsen authored Mar 18, 2020
  
  eca3a2c4
17 Mar, 2020 2 commits
- Remove dead code · 6b2ef787
  Thor Johnsen authored Mar 16, 2020
  
  6b2ef787
- Rename inplace to no_copy to make effect clearer · d662f9ca
  Thor Johnsen authored Mar 16, 2020
  
  d662f9ca
13 Mar, 2020 14 commits
- Merge branch 'revertable_fused_adam_with_mt_support' of... · 9f6c0da5
  Thor Johnsen authored Mar 13, 2020
```
Merge branch 'revertable_fused_adam_with_mt_support' of https://github.com/NVIDIA/apex into revertable_fused_adam_with_mt_support

Rebasing
```
  9f6c0da5
- Bug fix · f91bc66b
  Thor Johnsen authored Mar 12, 2020
  
  f91bc66b
- Bug fix · a3743069
  Thor Johnsen authored Mar 12, 2020
  
  a3743069
- Modify fused_adam to take advantage of undo feature · eb8384b5
  Thor Johnsen authored Mar 12, 2020
  
  eb8384b5
- Modify fused_adam to take advantage of undo feature · f1e565f5
  Thor Johnsen authored Mar 12, 2020
  
  f1e565f5
- Add backwards compatible support for no inplace NCCL op · c659e564
  Thor Johnsen authored Mar 12, 2020
  
  c659e564
- Bug fix · 68715149
  Thor Johnsen authored Mar 12, 2020
  
  68715149
- Bug fix · 22c2be04
  Thor Johnsen authored Mar 12, 2020
  
  22c2be04
- Add distributed optimizer · 9c742695
  Thor Johnsen authored Mar 12, 2020
  
  9c742695
- Bug fix · 8759ce0c
  Thor Johnsen authored Mar 12, 2020
  
  8759ce0c
- Bug fix · 79b2cc28
  Thor Johnsen authored Mar 12, 2020
  
  79b2cc28
- Bug fix · bd6b1ebc
  Thor Johnsen authored Mar 12, 2020
  
  bd6b1ebc
- First commit · 1a994e37
  Thor Johnsen authored Mar 11, 2020
  
  1a994e37
- Bug fix · 825cf27c
  Thor Johnsen authored Mar 12, 2020
  
  825cf27c
12 Mar, 2020 11 commits
- Bug fix · 841e5ee1
  Thor Johnsen authored Mar 12, 2020
  
  841e5ee1
- Modify fused_adam to take advantage of undo feature · 1d4a95d4
  Thor Johnsen authored Mar 12, 2020
  
  1d4a95d4
- Modify fused_adam to take advantage of undo feature · d48218a0
  Thor Johnsen authored Mar 12, 2020
  
  d48218a0
- Add backwards compatible support for no inplace NCCL op · c7372320
  Thor Johnsen authored Mar 12, 2020
  
  c7372320
- Bug fix · 400cf628
  Thor Johnsen authored Mar 12, 2020
  
  400cf628
- Bug fix · 96757752
  Thor Johnsen authored Mar 12, 2020
  
  96757752
- Add distributed optimizer · 1210d8fe
  Thor Johnsen authored Mar 12, 2020
  
  1210d8fe
- Bug fix · cfc4229e
  Thor Johnsen authored Mar 12, 2020
  
  cfc4229e
- Bug fix · d2214aa4
  Thor Johnsen authored Mar 12, 2020
  
  d2214aa4
- Bug fix · 86ad07f7
  Thor Johnsen authored Mar 12, 2020
  
  86ad07f7
- First commit · 8cc99c29
  Thor Johnsen authored Mar 11, 2020
  
  8cc99c29
11 Mar, 2020 2 commits

Fix deprecated calls in multihead_attn and ninja build failure (#746) · 80b90b9d

ptrblck authored Mar 11, 2020



* disable ninja for multihead_attn

* fix getCurrentStream in multihead_attn
Co-authored-by: pbialecki <pbialecki@nvidia.com>

80b90b9d

Do not unscale the gradients if loss scale equal to 1 (#748) · 20d00ab1

Tomasz Grel authored Mar 11, 2020

* Do not unscale the gradients if loss scale equal to 1

* Disable unscaling loss scale == 1 only for static scaling

20d00ab1

02 Mar, 2020 1 commit
- Revert "remove gencode from multihead_attn build (#731)" · 5633f6db
  pbialecki authored Mar 01, 2020
```
This reverts commit 92b3b9a9.
```
  5633f6db
27 Feb, 2020 1 commit
- NHWC support for multi tensor apply (#732) · de6378f5
  mcarilli authored Feb 26, 2020
```
* NHWC support for multi tensor apply

* compilation fix for version<=1.4
```
  de6378f5
25 Feb, 2020 3 commits
- remove gencode from multihead_attn build (#731) · 92b3b9a9
  ptrblck authored Feb 25, 2020
  
  92b3b9a9
- remove duplicated multihead_attn install (#729) · 5f6b9b0e
  ptrblck authored Feb 24, 2020
  
  5f6b9b0e
- Adding 'ctc_loss' to the list of FP32 funcs (#722) · 93cabd5d
  Saransh Karira authored Feb 25, 2020
  
  93cabd5d
24 Feb, 2020 1 commit

Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a

Kevin Stephano authored Feb 24, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

1733946a