Commits · 35e86d3ded5f8eb84e423df357ddf63ec4d62900 · OpenDAS / apex

17 Mar, 2020 2 commits
- add additional loop for lists of params when loading state_dict in... · 35e86d3d
  Kexin Yu authored Mar 17, 2020
```
add additional loop for lists of params when loading state_dict in apex.contrib.optimizers.FP16_Optimizer
```
  35e86d3d
- Merge remote-tracking branch 'upstream/master' · 93f91cde
  Kexin Yu authored Mar 17, 2020
  
  93f91cde
11 Mar, 2020 2 commits

Fix deprecated calls in multihead_attn and ninja build failure (#746) · 80b90b9d

ptrblck authored Mar 11, 2020



* disable ninja for multihead_attn

* fix getCurrentStream in multihead_attn
Co-authored-by: pbialecki <pbialecki@nvidia.com>

80b90b9d

Do not unscale the gradients if loss scale equal to 1 (#748) · 20d00ab1

Tomasz Grel authored Mar 11, 2020

* Do not unscale the gradients if loss scale equal to 1

* Disable unscaling loss scale == 1 only for static scaling

20d00ab1

02 Mar, 2020 1 commit
- Revert "remove gencode from multihead_attn build (#731)" · 5633f6db
  pbialecki authored Mar 01, 2020
```
This reverts commit 92b3b9a9.
```
  5633f6db
27 Feb, 2020 1 commit
- NHWC support for multi tensor apply (#732) · de6378f5
  mcarilli authored Feb 26, 2020
```
* NHWC support for multi tensor apply

* compilation fix for version<=1.4
```
  de6378f5
25 Feb, 2020 3 commits
- remove gencode from multihead_attn build (#731) · 92b3b9a9
  ptrblck authored Feb 25, 2020
  
  92b3b9a9
- remove duplicated multihead_attn install (#729) · 5f6b9b0e
  ptrblck authored Feb 24, 2020
  
  5f6b9b0e
- Adding 'ctc_loss' to the list of FP32 funcs (#722) · 93cabd5d
  Saransh Karira authored Feb 25, 2020
  
  93cabd5d
24 Feb, 2020 1 commit

Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a

Kevin Stephano authored Feb 24, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

1733946a

15 Feb, 2020 1 commit
- change include_dirs to abs path (#719) · 50338df6
  Deyu Fu authored Feb 14, 2020
  
  50338df6
10 Feb, 2020 1 commit
- Fix opt_level command line arg in instructions. (#713) · 5b71d369
  Ayla Khan authored Feb 10, 2020
```
Actual flag is --opt_level and copy pasting the example results in an unrecognized arguments error.
```
  5b71d369
06 Feb, 2020 1 commit

Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e

Kevin Stephano authored Feb 06, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

3f94528e

05 Feb, 2020 3 commits
- Fix attribute name mismatch in state_dict() and load_state_dict() (#704) · 494f8ab3
  Kexin Yu authored Feb 05, 2020
```
* updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD

* fix attribute name mismatch in state_dict() and load_state_dict()
```
  494f8ab3
- fix attribute name mismatch in state_dict() and load_state_dict() · 33082d2b
  Kexin Yu authored Feb 05, 2020
  
  33082d2b
- Merge branch 'master' of https://github.com/NVIDIA/apex · 858d7899
  Kexin Yu authored Feb 05, 2020
  
  858d7899
27 Jan, 2020 1 commit
- Channels last support (#668) · 2ca894da
  Vitaly Fedyunin authored Jan 27, 2020
  
  2ca894da
21 Jan, 2020 1 commit
- removing build target sm_70 from bnp (#683) · b66ffc1d
  jjsjann123 authored Jan 20, 2020
  
  b66ffc1d
08 Jan, 2020 1 commit
- add WAR for pip>=19.3.1 (#652) · b5a7c5f9
  ptrblck authored Jan 08, 2020
```
* add WAR for pip>=19.3.1

* remove pipmain, use extras_require instead
```
  b5a7c5f9
18 Dec, 2019 2 commits
- fix beta2 => beta1 (#661) · 0ce8ad3e
  bonlime authored Dec 18, 2019
  
  0ce8ad3e
- updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD (#657) · c19ee275
  Kexin Yu authored Dec 18, 2019
  
  c19ee275
17 Dec, 2019 1 commit
- updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD · 8d2647f8
  Kexin Yu authored Dec 16, 2019
  
  8d2647f8
05 Dec, 2019 1 commit
- Fixing typo in PyProf README (#637) · 4ad9b3bd
  Neil Tenenholtz authored Dec 04, 2019
  
  4ad9b3bd
03 Dec, 2019 1 commit
- Don't check if distributed is initialized on Windows · f37fdf07
  Michael Carilli authored Dec 03, 2019
  
  f37fdf07
22 Nov, 2019 1 commit
- update _amp_state to check distributed on maybe_print (#620) · 82dac9c9
  Roshan Rao authored Nov 22, 2019
  
  82dac9c9
06 Nov, 2019 1 commit
- fixing batchnorm 1d input (#590) · 37cdaf4a
  jjsjann123 authored Nov 06, 2019
  
  37cdaf4a
30 Oct, 2019 1 commit
- Update README.md · 606c3dcc
  mcarilli authored Oct 30, 2019
  
  606c3dcc
23 Oct, 2019 1 commit
- add gelu activation to fp32 list (#564) · 5b29cc13
  Bram Vanroy authored Oct 23, 2019
  
  5b29cc13
22 Oct, 2019 1 commit
- Making the encouragement to use O1 a bit stronger... · 95d6c007
  Michael Carilli authored Oct 22, 2019
  
  95d6c007
19 Oct, 2019 1 commit
- Made the patched optimizer step function a full method, not simply a function... · 4b913261
  Aron Hoffmann authored Oct 20, 2019
```
Made the patched optimizer step function a full method, not simply a function stored as an instance member (#553)
```
  4b913261
10 Oct, 2019 2 commits
- Adding presentation link to sphinx landing page · 08898593
  Michael Carilli authored Oct 10, 2019
  
  08898593
- Adding links to references. (maybe make this a subrepo?) · ec93c75b
  mcarilli authored Oct 10, 2019
  
  ec93c75b
09 Oct, 2019 2 commits
- allow for non-distributed envs (Windows) (#531) · fab319f1
  Bram Vanroy authored Oct 09, 2019
  
  fab319f1
- Fixed tensor core lookup for Turing (#534) · 753c427a
  Marek Kolodziej authored Oct 09, 2019
  
  753c427a
08 Oct, 2019 1 commit
- Include loss scaling in README code example (#523) · e87b5799
  Jan Schlüter authored Oct 08, 2019
  
  e87b5799
04 Oct, 2019 1 commit

move previous fused_adam and fp16_optimizer to contrib (#517) · 1904e48d

Deyu Fu authored Oct 04, 2019

* move previous fused_adam and fp16_optimizer to contrib

* make build contrib.fused_adam optional

* change build option name

* remove unnecessary try import

1904e48d

03 Oct, 2019 1 commit

Disable tests for mixed opt_levels, add bitwise accurate test of parameters (#520) · 0b74bfd9

ptrblck authored Oct 03, 2019

* increase atol for Half-Float comparison to 1.5e-4

* disable tests for different opt_levels

* reset atol

* add bitwise accurate comparison

0b74bfd9

02 Oct, 2019 1 commit
- fix https://github.com/facebookresearch/maskrcnn-benchmark/issues/802 (#516) · 03421e87
  Timothee Cour authored Oct 01, 2019
  
  03421e87
13 Sep, 2019 1 commit
- Seems to work locally (#490) · 3ae89c75
  mcarilli authored Sep 12, 2019
  
  3ae89c75
12 Sep, 2019 1 commit
- Fixed error in convert_syncbn_model function (#380) · e6cb749b
  Youngjin Kim authored Sep 13, 2019
  
  e6cb749b