Commits · 93cabd5df060b3783228dd75941ad359be5d391d · OpenDAS / apex

25 Feb, 2020 1 commit
- Adding 'ctc_loss' to the list of FP32 funcs (#722) · 93cabd5d
  Saransh Karira authored Feb 25, 2020
  
  93cabd5d
24 Feb, 2020 1 commit

Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a

Kevin Stephano authored Feb 24, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

1733946a

15 Feb, 2020 1 commit
- change include_dirs to abs path (#719) · 50338df6
  Deyu Fu authored Feb 14, 2020
  
  50338df6
10 Feb, 2020 1 commit
- Fix opt_level command line arg in instructions. (#713) · 5b71d369
  Ayla Khan authored Feb 10, 2020
```
Actual flag is --opt_level and copy pasting the example results in an unrecognized arguments error.
```
  5b71d369
06 Feb, 2020 1 commit

Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e

Kevin Stephano authored Feb 06, 2020

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention.  Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Fix matmul1 output tensor size.  Fix tests that missed issue.

3f94528e

05 Feb, 2020 1 commit

Fix attribute name mismatch in state_dict() and load_state_dict() (#704) · 494f8ab3

Kexin Yu authored Feb 05, 2020

* updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD

* fix attribute name mismatch in state_dict() and load_state_dict()

494f8ab3

27 Jan, 2020 1 commit
- Channels last support (#668) · 2ca894da
  Vitaly Fedyunin authored Jan 27, 2020
  
  2ca894da
21 Jan, 2020 1 commit
- removing build target sm_70 from bnp (#683) · b66ffc1d
  jjsjann123 authored Jan 20, 2020
  
  b66ffc1d
08 Jan, 2020 1 commit
- add WAR for pip>=19.3.1 (#652) · b5a7c5f9
  ptrblck authored Jan 08, 2020
```
* add WAR for pip>=19.3.1

* remove pipmain, use extras_require instead
```
  b5a7c5f9
18 Dec, 2019 2 commits
- fix beta2 => beta1 (#661) · 0ce8ad3e
  bonlime authored Dec 18, 2019
  
  0ce8ad3e
- updated apex.contrib.optimizers.FP16_Optimizer and FusedSGD (#657) · c19ee275
  Kexin Yu authored Dec 18, 2019
  
  c19ee275
05 Dec, 2019 1 commit
- Fixing typo in PyProf README (#637) · 4ad9b3bd
  Neil Tenenholtz authored Dec 04, 2019
  
  4ad9b3bd
03 Dec, 2019 1 commit
- Don't check if distributed is initialized on Windows · f37fdf07
  Michael Carilli authored Dec 03, 2019
  
  f37fdf07
22 Nov, 2019 1 commit
- update _amp_state to check distributed on maybe_print (#620) · 82dac9c9
  Roshan Rao authored Nov 22, 2019
  
  82dac9c9
06 Nov, 2019 1 commit
- fixing batchnorm 1d input (#590) · 37cdaf4a
  jjsjann123 authored Nov 06, 2019
  
  37cdaf4a
30 Oct, 2019 1 commit
- Update README.md · 606c3dcc
  mcarilli authored Oct 30, 2019
  
  606c3dcc
23 Oct, 2019 1 commit
- add gelu activation to fp32 list (#564) · 5b29cc13
  Bram Vanroy authored Oct 23, 2019
  
  5b29cc13
22 Oct, 2019 1 commit
- Making the encouragement to use O1 a bit stronger... · 95d6c007
  Michael Carilli authored Oct 22, 2019
  
  95d6c007
19 Oct, 2019 1 commit
- Made the patched optimizer step function a full method, not simply a function... · 4b913261
  Aron Hoffmann authored Oct 20, 2019
```
Made the patched optimizer step function a full method, not simply a function stored as an instance member (#553)
```
  4b913261
10 Oct, 2019 2 commits
- Adding presentation link to sphinx landing page · 08898593
  Michael Carilli authored Oct 10, 2019
  
  08898593
- Adding links to references. (maybe make this a subrepo?) · ec93c75b
  mcarilli authored Oct 10, 2019
  
  ec93c75b
09 Oct, 2019 2 commits
- allow for non-distributed envs (Windows) (#531) · fab319f1
  Bram Vanroy authored Oct 09, 2019
  
  fab319f1
- Fixed tensor core lookup for Turing (#534) · 753c427a
  Marek Kolodziej authored Oct 09, 2019
  
  753c427a
08 Oct, 2019 1 commit
- Include loss scaling in README code example (#523) · e87b5799
  Jan Schlüter authored Oct 08, 2019
  
  e87b5799
04 Oct, 2019 1 commit

move previous fused_adam and fp16_optimizer to contrib (#517) · 1904e48d

Deyu Fu authored Oct 04, 2019

* move previous fused_adam and fp16_optimizer to contrib

* make build contrib.fused_adam optional

* change build option name

* remove unnecessary try import

1904e48d

03 Oct, 2019 1 commit

Disable tests for mixed opt_levels, add bitwise accurate test of parameters (#520) · 0b74bfd9

ptrblck authored Oct 03, 2019

* increase atol for Half-Float comparison to 1.5e-4

* disable tests for different opt_levels

* reset atol

* add bitwise accurate comparison

0b74bfd9

02 Oct, 2019 1 commit
- fix https://github.com/facebookresearch/maskrcnn-benchmark/issues/802 (#516) · 03421e87
  Timothee Cour authored Oct 01, 2019
  
  03421e87
13 Sep, 2019 1 commit
- Seems to work locally (#490) · 3ae89c75
  mcarilli authored Sep 12, 2019
  
  3ae89c75
12 Sep, 2019 1 commit
- Fixed error in convert_syncbn_model function (#380) · e6cb749b
  Youngjin Kim authored Sep 13, 2019
  
  e6cb749b
11 Sep, 2019 1 commit
- removing nvtx range used for debugging (#485) · ad98cc5f
  jjsjann123 authored Sep 11, 2019
  
  ad98cc5f
06 Sep, 2019 2 commits
- Fix for #456 (#477) · 325f5a0b
  mcarilli authored Sep 05, 2019
```
* Pushing for build tests

* Contrib files

* Removing deprecated checks
```
  325f5a0b
- LARC needs no Variable (#461) · 1bf0d8d4
  Tony-Y authored Sep 06, 2019
```
Remove torch.autograd.Variable
```
  1bf0d8d4
03 Sep, 2019 2 commits
- Fix issues in fused_dam (#469) · 7fa74925
  Deyu Fu authored Sep 03, 2019
```
* move import of amp_C to __init__()

* make fp16/32 separate lists to support mixed param types, disable double test

* make zero_grad consistent between adam/novograd/lamb
```
  7fa74925
- remove deprecated backaned.FunctionBackend calls (#466) · 35a85789
  ptrblck authored Sep 03, 2019
  
  35a85789
30 Aug, 2019 1 commit
- [novograd] move exp_avg_sq to param device in load_state_dict (#459) · 53eae198
  Deyu Fu authored Aug 29, 2019
  
  53eae198
27 Aug, 2019 5 commits
- Enable Checkpointing (#420) · dec4fdd6
  ptrblck authored Aug 27, 2019
```
* add state_dict, load_state_dict

* add test_restoring, test_loss_scale_decrease

* disable amp outputs for checkpoint tests

* add test for amp.state_dict, cleanup

* add state_dict patch, add test

* fixed testing, cleanup

* add readme for checkpointing

* add docs to source/amp

* add review changes to doc
```
  dec4fdd6
- Deleting test_fp16_optimizer.py · 30ed793e
  Michael Carilli authored Aug 27, 2019
  
  30ed793e
- Docstring updates · b9e5d37d
  Michael Carilli authored Aug 27, 2019
  
  b9e5d37d
- Docstring updates · 17e8a552
  Michael Carilli authored Aug 27, 2019
  
  17e8a552
- Merge branch 'master' of https://github.com/NVIDIA/apex · ea7c2098
  Michael Carilli authored Aug 27, 2019
  
  ea7c2098