Commits · 59995c7687f7fc2cb6bb7d5dcadd6d4821584320 · OpenDAS / apex

01 Jul, 2020 1 commit
- name -> p_name (name variable is out of scope) · 59995c76
  Kirthi Sivamani authored Jul 01, 2020
  
  59995c76
10 Jun, 2020 4 commits
- fixed import error · 73666989
  Kirthi Sivamani authored Jun 10, 2020
  
  73666989
- update README · b152a8dd
  Kirthi Sivamani authored Jun 10, 2020
  
  b152a8dd
- added README for ASP API · 3028b18b
  Kirthi Sivamani authored Jun 10, 2020
  
  3028b18b
- asp files · 2a5b726a
  Kirthi Sivamani authored Jun 10, 2020
  
  2a5b726a
01 Jun, 2020 3 commits
- Add Pyprof removal warnings that point to new repo (#862) · 097238f8
  mcarilli authored Jun 01, 2020
```
Co-authored-by: Michael Carilli <mcarilli@nvidia.com>
```
  097238f8
- Merge pull request #867 from NVIDIA/fix_python_only_apex_distopt · 76026a35
  Thor Johnsen authored Jun 01, 2020
```
Remove distributed lamb from __init__.py
```
  76026a35
- Remove distributed lamb from __init__.py · 012135b1
  Thor Johnsen authored Jun 01, 2020
  
  012135b1
31 May, 2020 6 commits
- Merge pull request #863 from NVIDIA/distributed_lamb_optimizer · cdb0b3ee
  Thor Johnsen authored May 30, 2020
```
Distributed lamb optimizer
```
  cdb0b3ee
- Remove temporary file · 55ae5fbb
  Thor Johnsen authored May 30, 2020
  
  55ae5fbb
- Remove verbose print-out · b614a855
  Thor Johnsen authored May 30, 2020
  
  b614a855
- WIP · ba453a8e
  Thor Johnsen authored May 31, 2020
  
  ba453a8e
- Delayed init · 75f1e9d7
  Thor Johnsen authored May 31, 2020
  
  75f1e9d7
- Delayed init · 53cfd8c2
  Thor Johnsen authored May 30, 2020
  
  53cfd8c2
30 May, 2020 19 commits
- Make step global state variable · 4c54fd2b
  Thor Johnsen authored May 30, 2020
  
  4c54fd2b
- Make step global state variable · 5caf95ca
  Thor Johnsen authored May 30, 2020
  
  5caf95ca
- Bug fix · 7741808b
  Thor Johnsen authored May 30, 2020
  
  7741808b
- Add more default values from regular lamb optimizer · 12458152
  Thor Johnsen authored May 30, 2020
  
  12458152
- Bug fix in update norm calculation · 1e0aadd5
  Thor Johnsen authored May 30, 2020
  
  1e0aadd5
- Bug fix in update norm calculation · 0f64f6ad
  Thor Johnsen authored May 30, 2020
  
  0f64f6ad
- Bug fix in update norm calculation · 56650eb8
  Thor Johnsen authored May 30, 2020
  
  56650eb8
- Bug fix · fb2d0f48
  Thor Johnsen authored May 30, 2020
  
  fb2d0f48
- Bug fix · 3c02784b
  Thor Johnsen authored May 30, 2020
  
  3c02784b
- Bug fix · 9773218c
  Thor Johnsen authored May 30, 2020
  
  9773218c
- Add optional accumulation step · 02fd7341
  Thor Johnsen authored May 30, 2020
  
  02fd7341
- Bug fix · 9a09107c
  Thor Johnsen authored May 30, 2020
  
  9a09107c
- Bug fix · 1ac28972
  Thor Johnsen authored May 30, 2020
  
  1ac28972
- Bug fix · ef437358
  Thor Johnsen authored May 30, 2020
  
  ef437358
- Bug fix · e6925e6c
  Thor Johnsen authored May 30, 2020
  
  e6925e6c
- Use correct names for mt lamb cuda kernels · 8ed8eaac
  Thor Johnsen authored May 30, 2020
  
  8ed8eaac
- Make separate apex option for distributed lamb · 45388d48
  Thor Johnsen authored May 30, 2020
  
  45388d48
- Remove unused forward def · 848a2844
  Thor Johnsen authored May 30, 2020
  
  848a2844
- Distributed LAMB optimizer · 19892f1d
  Thor Johnsen authored May 30, 2020
  
  19892f1d
29 May, 2020 3 commits
- Fixes to Multihead Attention with LayerNorm and Dropout-Add (#860) · 5754fa7a
  Kevin Stephano authored May 29, 2020
  
  5754fa7a
- Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add... · 6c2babf9
  Burc Eryilmaz authored May 29, 2020
```
Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add additive mask support, separate Q/K/V parameters (#854)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
```
  6c2babf9
- Merge pull request #851 from kexinyu/master · 36c9e904
  Kexin Yu authored May 29, 2020
```
make FusedLAMB async
```
  36c9e904
28 May, 2020 1 commit
- Fixed a typo (#856) · 87aca22a
  Max V. Irgiznov authored May 28, 2020
  
  87aca22a
27 May, 2020 1 commit

Update Softmax in multihead attention to use the Current Cuda Stream instead... · 5cb187f3

Kevin Stephano authored May 26, 2020

Update Softmax in multihead attention to use the Current Cuda Stream instead of the Default Cuda Stream. (#843)

* Adding C++ Multihead Attention implementation to contrib.

* Add reference test that at least works for forward.

* Remove CublasLt support from multihead attention.

* Add new Python version of self attention.

* Update python model of MHA with backward pass.

* Fixed Output Linear connection in MHA.

* Clean up compiles and add documentation to PySelfAttention.

* Add Encdec Python version of multihead attention. Cleanup files.

* Tests for self and encdec multihead attention.

* Add reference pytorch implementation of attention with norm and add.

* Add cutlass branch definition.

* Add cutlass download to compile.

* Add norm/add tests.

* Add biases to pytorch python versions.

* Add tests and fix issues with python version of attention masking.

* Create README.md

* Update README.md

* Update perf test parameters.

* Update README.md

* Add files via upload

* Update README.md

* Fix matmul1 output tensor size. Fix tests that missed issue.

* Allow for Z dimensions of 64K and greater on batched GEMMs.

* remove redundant imports

* general cleanup, remove deprecated or unused functions

* Update Multihead Attention's softmax to use the Current Stream instead of the default stream.

* Fix setup.py that got messed up in merge with upstream.

* Update Multihead Attention strided batched gemms to use the current stream instead of the default.
Co-authored-by: pbialecki <pbialecki@nvidia.com>

5cb187f3

23 May, 2020 1 commit
- fix function signature · 2be773d3
  Kexin Yu authored May 23, 2020
  
  2be773d3
22 May, 2020 1 commit
- more fixes on dtypes · cf918ac1
  Kexin Yu authored May 22, 2020
  
  cf918ac1