Commits · e2083df5eb96643c61613b9df48dd4eea6b07690 · OpenDAS / apex

23 Feb, 2021 1 commit
- fast layer norm (#1037) · e2083df5
  yjk21 authored Feb 23, 2021
  
  e2083df5
10 Feb, 2021 1 commit

fix import container_abcs issue (#1049) · a78ccf0b

Shoufa Chen authored Feb 10, 2021

* copy-paste friendly

* fix import container_abcs issue

Nightly PyTorch has removed `container_abcs` from `torch._six`.  https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35

* fix import container_abcs issue

Nightly PyTorch has removed `container_abcs` from `torch._six`.
https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35

* keep existing for pytorch1.7 and earlier

a78ccf0b

20 Jan, 2021 1 commit
- cuda rng changes for graph capture with apex MHA (#1025) · eefb1ba2
  Burc Eryilmaz authored Jan 20, 2021
```
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
```
  eefb1ba2
17 Dec, 2020 2 commits

Merge pull request #1015 from jpool-nv/patch-1 · 154c6336
Thor Johnsen authored Dec 17, 2020
```
Update ASP README to highlight default recipe
```
154c6336

Update ASP README to highlight default recipe · 56914d4f

jpool-nv authored Dec 17, 2020

The Recipe was presented after some non-standard API calls, so moving the suggested usage up, giving it its own section, and reinforcing the suggested usage in the non-standard section.

56914d4f

04 Dec, 2020 3 commits
- remove noise pip-version-check noise that hides the outcome of the build (#998) · 8cf5ae61
  Stas Bekman authored Dec 04, 2020
  
  8cf5ae61
- Distributed LAMB fixes (#1007) · 8a80d478
  Kexin Yu authored Dec 03, 2020
```
* add flag for DistributedAdam: step_support_amp_scaling
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>
```
  8a80d478
- Seryilmaz/fused dropout softmax (#985) · 3fe10b55
  Burc Eryilmaz authored Dec 03, 2020
```
* fuse dropout into softmax in fprop for additive mask case
```
  3fe10b55
02 Dec, 2020 1 commit

Fix lack of proper loading of best_prec1 from the checkpoint (#1000) · 6c186b3b

Janusz Lisiecki authored Dec 02, 2020



- resume() is a nested function and when it loads best_prec1
  it creates a local variable that hides the one from the parent
  function (which refers to the global one). This PR adds `global`
  to modify the global variable as intended
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

6c186b3b

01 Dec, 2020 1 commit

DistributedFusedAdam Model Parallelism Support (Megatron) (#981) · 6b7e77b0

Kexin Yu authored Dec 01, 2020



DistributedFusedAdam Model Parallelism Support (Megatron)
Co-authored-by: Kexin Yu <kexiny@nvidia.com>
Co-authored-by: Kexin Yu <kexinznzn@gmail.com>

6b7e77b0

19 Oct, 2020 1 commit

Optimize the sync batchnorm by batching the communication (#980) · 8a1ed9e8

lly-zero-one authored Oct 19, 2020

In this PR, we mainly tried to optimize the performance of Syncatchnorm and also fixed one potential issue in the welford_parallel kernel implementation.

For performance improvement, we batched the mean/var/count all_gather communication together and sent it once in the forward path
We also batch the all_reduce in backward path
We add the contiguous call on the input of welford_parallel kernel.
If there is any standard perf benchmark, I would be happy to run it.

8a1ed9e8

29 Sep, 2020 1 commit
- use reshape instead of view (#971) · a109f856
  ptrblck authored Sep 28, 2020
  
  a109f856
15 Sep, 2020 1 commit
- Merge pull request #959 from a-maci/update-ASP-readme · 4a1fa2c4
  Thor Johnsen authored Sep 15, 2020
```
Update asp readme
```
  4a1fa2c4
14 Sep, 2020 2 commits
- Merge pull request #5 from a-maci/a-maci-patch-update-asp-readme · 48fc613d
  Asit authored Sep 14, 2020
```
Update README for ASP
```
  48fc613d
- Update README for ASP · e3794f42
  Asit authored Sep 14, 2020
```
Added an outline to illustrate our recommended recipe to obtain a pruned model
```
  e3794f42
15 Aug, 2020 1 commit
- Should pass stricter stride/size checks in pytorch (#942) · 4ef930c1
  mcarilli authored Aug 14, 2020
  
  4ef930c1
10 Aug, 2020 1 commit
- move sm80 code inside MHA (#937) · 5d9b5cbc
  ptrblck authored Aug 10, 2020
```
Co-authored-by: pbialecki <pbialecki@nvidia.com>
```
  5d9b5cbc
06 Aug, 2020 1 commit
- sgd supports zero-grad (#926) · 85b17833
  ngimel authored Aug 06, 2020
  
  85b17833
05 Aug, 2020 1 commit

set device guard for multi tensor optimizer implementations (#927) · 274cc063

ngimel authored Aug 05, 2020

* add device guards to the optimizers

* add untracked file

* set deviceGuard in multi_tensor_apply

* address review comments; fix lamb

* indent

* typo

274cc063

01 Aug, 2020 1 commit
- Add sm80 for CUDA >= 11 (#925) · 5b53121a
  ptrblck authored Aug 01, 2020
  
  5b53121a
30 Jul, 2020 1 commit
- fix default mode missing additive mask option (#924) · 700d6825
  Burc Eryilmaz authored Jul 30, 2020
```
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
```
  700d6825
23 Jul, 2020 1 commit
- Merge pull request #918 from a-maci/ASP_sparse_param_dict_update · 459de22d
  Thor Johnsen authored Jul 23, 2020
```
Asp sparse param dict update
```
  459de22d
22 Jul, 2020 3 commits

Merge pull request #4 from a-maci/ASP_sparse_param_dict_update · eb5e96c2
Asit authored Jul 22, 2020
```
Accept custom (layer type:param name) to include in sparse_parameter …
```
eb5e96c2

Accept custom (layer type:param name) to include in sparse_parameter dictionary · b3c16411

Asit authored Jul 22, 2020

1. Support to include in sparse_parameter_list an user-supplied custom layer type and its parameter name. This is useful when users have their own implementation of nn.Linear or nn.Conv2D. For example, huggingface repo has a custom implementation of nn.Linear called LinearActivation.
2. Print info of layers in the model that are not pruned.

b3c16411

Merge pull request #3 from NVIDIA/master · eb95950d
Asit authored Jul 22, 2020
```
Merge pull request #917 from a-maci/master
```
eb95950d

21 Jul, 2020 1 commit
- Merge pull request #917 from a-maci/master · 0ac5dd62
  Thor Johnsen authored Jul 20, 2020
```
Fixing the case when grads are None
```
  0ac5dd62
20 Jul, 2020 3 commits
- Merge pull request #2 from a-maci/a-maci-patch-1 · 089149d3
  Asit authored Jul 20, 2020
```
Fixing mask multiplication with grad tensors
```
  089149d3
- Fixing mask multiplication with grad tensors · 774de913
  Asit authored Jul 20, 2020
```
Grads can be None type. Adding this fix to skip multiplication with masks if this is the case.
```
  774de913
- Merge pull request #1 from NVIDIA/master · 3dd36070
  Asit authored Jul 20, 2020
```
Updating my repo
```
  3dd36070
16 Jul, 2020 2 commits
- Merge pull request #910 from szmigacz/smigacz/mha_xavier_init_gain_fix · 3104fd59
  Thor Johnsen authored Jul 16, 2020
```
Fixed weight init for fused weight matrices in fused MHA by adding correct gain factor
```
  3104fd59
- Merge pull request #904 from ksivaman/sparsity · 4027bcba
  Thor Johnsen authored Jul 16, 2020
```
Fixed variable name
```
  4027bcba
09 Jul, 2020 1 commit
- Fixed weight init for fused weight matrices in fused MHA by adding correct gain factor. · a0d99fdb
  Szymon Migacz authored Jul 09, 2020
  
  a0d99fdb
06 Jul, 2020 1 commit

[sync BN] (#792) · 1ff54b8f

jjsjann123 authored Jul 06, 2020

* [sync BN]

support non-uniform batch size across process group.

TODO: test should be added once cleaned up.

* updating unit tests

* new unit tests for different inputs

* cleaning

1ff54b8f

01 Jul, 2020 1 commit
- name -> p_name (name variable is out of scope) · 59995c76
  Kirthi Sivamani authored Jul 01, 2020
  
  59995c76
30 Jun, 2020 1 commit

Don't patch tensor ops that aren't present (#899) · 43a6f9fe

mcarilli authored Jun 30, 2020



* Only attempt to patch Tensor methods if defined

* syntax
Co-authored-by: Michael Carilli <mcarilli@nvidia.com>

43a6f9fe

23 Jun, 2020 4 commits
- Merge pull request #892 from kexinyu/master · 44532b30
  Kexin Yu authored Jun 23, 2020
```
add unit tests for FusedLAMB optimizer
```
  44532b30
- add test case for non-zero weight decay · ad50ce9a
  Kexin Yu authored Jun 23, 2020
  
  ad50ce9a
- test nvlamb; hyperparams consistent with adam/adagrad tests · cd3d6d12
  Kexin Yu authored Jun 23, 2020
  
  cd3d6d12
- add test for FusedLAMB · 9774ce0d
  Kexin Yu authored Jun 22, 2020
  
  9774ce0d
15 Jun, 2020 1 commit
- Merge pull request #885 from a-maci/2dmasking_sparsity · c3fad1ad
  Thor Johnsen authored Jun 15, 2020
```
2d masking and sparsity
```
  c3fad1ad