Commits · ca00adace51ae4d82d2ff5cfc1ef1c9174ff6aaa · OpenDAS / apex

31 Mar, 2020 1 commit
- Add support for bool datatype (#601) (#603) · ca00adac
  Jeff Bowles authored Mar 31, 2020
  
  ca00adac
25 Mar, 2020 1 commit

Fix contrib fused_adam to work correctly with multi-GPU (#752) · 8fac3a72

msbaines authored Mar 24, 2020



The cuda kernel used by fused-adam was using the default stream
on the default device. The kernel needs use the same device as
the parameter tensor.

Fixed by using context manager to set correct default device. For
the use_mt case, raised an error. Alternatively, the use_mt
case could launch one kernel per cuda device.

The non-contrib version will also need to be fixed.
Co-authored-by: Mandeep Singh Baines <msb@fb.com>

8fac3a72

11 Mar, 2020 2 commits

Fix deprecated calls in multihead_attn and ninja build failure (#746) · 80b90b9d

ptrblck authored Mar 11, 2020



* disable ninja for multihead_attn

* fix getCurrentStream in multihead_attn
Co-authored-by: pbialecki <pbialecki@nvidia.com>

80b90b9d

Do not unscale the gradients if loss scale equal to 1 (#748) · 20d00ab1

Tomasz Grel authored Mar 11, 2020

* Do not unscale the gradients if loss scale equal to 1

* Disable unscaling loss scale == 1 only for static scaling

20d00ab1

02 Mar, 2020 1 commit
- Revert "remove gencode from multihead_attn build (#731)" · 5633f6db
  pbialecki authored Mar 01, 2020
```
This reverts commit 92b3b9a9.
```
  5633f6db
27 Feb, 2020 1 commit
- NHWC support for multi tensor apply (#732) · de6378f5
  mcarilli authored Feb 26, 2020
```
* NHWC support for multi tensor apply

* compilation fix for version<=1.4
```
  de6378f5
25 Feb, 2020 3 commits
- remove gencode from multihead_attn build (#731) · 92b3b9a9
  ptrblck authored Feb 25, 2020
  
  92b3b9a9
- remove duplicated multihead_attn install (#729) · 5f6b9b0e
  ptrblck authored Feb 24, 2020
  
  5f6b9b0e
- Adding 'ctc_loss' to the list of FP32 funcs (#722) · 93cabd5d
  Saransh Karira authored Feb 25, 2020
  
  93cabd5d
24 Feb, 2020 1 commit

Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a

Kevin Stephano authored Feb 24, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

1733946a

15 Feb, 2020 1 commit
- change include_dirs to abs path (#719) · 50338df6
  Deyu Fu authored Feb 14, 2020
  
  50338df6
10 Feb, 2020 1 commit
- Fix opt_level command line arg in instructions. (#713) · 5b71d369
  Ayla Khan authored Feb 10, 2020
```
Actual flag is --opt_level and copy pasting the example results in an unrecognized arguments error.
```
  5b71d369
06 Feb, 2020 1 commit

Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e

Kevin Stephano authored Feb 06, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

3f94528e

05 Feb, 2020 1 commit

Fix attribute name mismatch in state_dict() and load_state_dict() (#704) · 494f8ab3

Kexin Yu authored Feb 05, 2020

* updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD

* fix attribute name mismatch in state_dict() and load_state_dict()

494f8ab3

27 Jan, 2020 1 commit
- Channels last support (#668) · 2ca894da
  Vitaly Fedyunin authored Jan 27, 2020
  
  2ca894da
21 Jan, 2020 1 commit
- removing build target sm_70 from bnp (#683) · b66ffc1d
  jjsjann123 authored Jan 20, 2020
  
  b66ffc1d
08 Jan, 2020 1 commit
- add WAR for pip>=19.3.1 (#652) · b5a7c5f9
  ptrblck authored Jan 08, 2020
```
* add WAR for pip>=19.3.1

* remove pipmain, use extras_require instead
```
  b5a7c5f9
18 Dec, 2019 2 commits
- fix beta2 => beta1 (#661) · 0ce8ad3e
  bonlime authored Dec 18, 2019
  
  0ce8ad3e
- updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD (#657) · c19ee275
  Kexin Yu authored Dec 18, 2019
  
  c19ee275
05 Dec, 2019 1 commit
- Fixing typo in PyProf README (#637) · 4ad9b3bd
  Neil Tenenholtz authored Dec 04, 2019
  
  4ad9b3bd
03 Dec, 2019 1 commit
- Don't check if distributed is initialized on Windows · f37fdf07
  Michael Carilli authored Dec 03, 2019
  
  f37fdf07
22 Nov, 2019 1 commit
- update _amp_state to check distributed on maybe_print (#620) · 82dac9c9
  Roshan Rao authored Nov 22, 2019
  
  82dac9c9
06 Nov, 2019 1 commit
- fixing batchnorm 1d input (#590) · 37cdaf4a
  jjsjann123 authored Nov 06, 2019
  
  37cdaf4a
30 Oct, 2019 1 commit
- Update README.md · 606c3dcc
  mcarilli authored Oct 30, 2019
  
  606c3dcc
23 Oct, 2019 1 commit
- add gelu activation to fp32 list (#564) · 5b29cc13
  Bram Vanroy authored Oct 23, 2019
  
  5b29cc13
22 Oct, 2019 1 commit
- Making the encouragement to use O1 a bit stronger... · 95d6c007
  Michael Carilli authored Oct 22, 2019
  
  95d6c007
19 Oct, 2019 1 commit
- Made the patched optimizer step function a full method, not simply a function... · 4b913261
  Aron Hoffmann authored Oct 20, 2019
```
Made the patched optimizer step function a full method, not simply a function stored as an instance member (#553)
```
  4b913261
10 Oct, 2019 2 commits
- Adding presentation link to sphinx landing page · 08898593
  Michael Carilli authored Oct 10, 2019
  
  08898593
- Adding links to references. (maybe make this a subrepo?) · ec93c75b
  mcarilli authored Oct 10, 2019
  
  ec93c75b
09 Oct, 2019 2 commits
- allow for non-distributed envs (Windows) (#531) · fab319f1
  Bram Vanroy authored Oct 09, 2019
  
  fab319f1
- Fixed tensor core lookup for Turing (#534) · 753c427a
  Marek Kolodziej authored Oct 09, 2019
  
  753c427a
08 Oct, 2019 1 commit
- Include loss scaling in README code example (#523) · e87b5799
  Jan Schlüter authored Oct 08, 2019
  
  e87b5799
04 Oct, 2019 1 commit

move previous fused_adam and fp16_optimizer to contrib (#517) · 1904e48d

Deyu Fu authored Oct 04, 2019

* move previous fused_adam and fp16_optimizer to contrib

* make build contrib.fused_adam optional

* change build option name

* remove unnecessary try import

1904e48d

03 Oct, 2019 1 commit

Disable tests for mixed opt_levels, add bitwise accurate test of parameters (#520) · 0b74bfd9

ptrblck authored Oct 03, 2019

* increase atol for Half-Float comparison to 1.5e-4

* disable tests for different opt_levels

* reset atol

* add bitwise accurate comparison

0b74bfd9

02 Oct, 2019 1 commit
- fix https://github.com/facebookresearch/maskrcnn-benchmark/issues/802 (#516) · 03421e87
  Timothee Cour authored Oct 01, 2019
  
  03421e87
13 Sep, 2019 1 commit
- Seems to work locally (#490) · 3ae89c75
  mcarilli authored Sep 12, 2019
  
  3ae89c75
12 Sep, 2019 1 commit
- Fixed error in convert_syncbn_model function (#380) · e6cb749b
  Youngjin Kim authored Sep 13, 2019
  
  e6cb749b
11 Sep, 2019 1 commit
- removing nvtx range used for debugging (#485) · ad98cc5f
  jjsjann123 authored Sep 11, 2019
  
  ad98cc5f
06 Sep, 2019 2 commits
- Fix for #456 (#477) · 325f5a0b
  mcarilli authored Sep 05, 2019
```
* Pushing for build tests

* Contrib files

* Removing deprecated checks
```
  325f5a0b
- LARC needs no Variable (#461) · 1bf0d8d4
  Tony-Y authored Sep 06, 2019
```
Remove torch.autograd.Variable
```
  1bf0d8d4