- 23 Feb, 2021 1 commit
-
-
yjk21 authored
-
- 10 Feb, 2021 1 commit
-
-
Shoufa Chen authored
* copy-paste friendly * fix import container_abcs issue Nightly PyTorch has removed `container_abcs` from `torch._six`. https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35 * fix import container_abcs issue Nightly PyTorch has removed `container_abcs` from `torch._six`. https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35 * keep existing for pytorch1.7 and earlier
-
- 20 Jan, 2021 1 commit
-
-
Burc Eryilmaz authored
Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
-
- 17 Dec, 2020 2 commits
-
-
Thor Johnsen authored
Update ASP README to highlight default recipe
-
jpool-nv authored
The Recipe was presented after some non-standard API calls, so moving the suggested usage up, giving it its own section, and reinforcing the suggested usage in the non-standard section.
-
- 04 Dec, 2020 3 commits
-
-
Stas Bekman authored
-
Kexin Yu authored
* add flag for DistributedAdam: step_support_amp_scaling Co-authored-by:
Kexin Yu <kexiny@nvidia.com> Co-authored-by:
Kexin Yu <kexinznzn@gmail.com>
-
Burc Eryilmaz authored
* fuse dropout into softmax in fprop for additive mask case
-
- 02 Dec, 2020 1 commit
-
-
Janusz Lisiecki authored
- resume() is a nested function and when it loads best_prec1 it creates a local variable that hides the one from the parent function (which refers to the global one). This PR adds `global` to modify the global variable as intended Signed-off-by:Janusz Lisiecki <jlisiecki@nvidia.com>
-
- 01 Dec, 2020 1 commit
-
-
Kexin Yu authored
DistributedFusedAdam Model Parallelism Support (Megatron) Co-authored-by:
Kexin Yu <kexiny@nvidia.com> Co-authored-by:
Kexin Yu <kexinznzn@gmail.com>
-
- 19 Oct, 2020 1 commit
-
-
lly-zero-one authored
In this PR, we mainly tried to optimize the performance of Syncatchnorm and also fixed one potential issue in the welford_parallel kernel implementation. For performance improvement, we batched the mean/var/count all_gather communication together and sent it once in the forward path We also batch the all_reduce in backward path We add the contiguous call on the input of welford_parallel kernel. If there is any standard perf benchmark, I would be happy to run it.
-
- 29 Sep, 2020 1 commit
-
-
ptrblck authored
-
- 15 Sep, 2020 1 commit
-
-
Thor Johnsen authored
Update asp readme
-
- 14 Sep, 2020 2 commits
- 15 Aug, 2020 1 commit
-
-
mcarilli authored
-
- 10 Aug, 2020 1 commit
-
-
ptrblck authored
Co-authored-by:pbialecki <pbialecki@nvidia.com>
-
- 06 Aug, 2020 1 commit
-
-
ngimel authored
-
- 05 Aug, 2020 1 commit
-
-
ngimel authored
* add device guards to the optimizers * add untracked file * set deviceGuard in multi_tensor_apply * address review comments; fix lamb * indent * typo
-
- 01 Aug, 2020 1 commit
-
-
ptrblck authored
-
- 30 Jul, 2020 1 commit
-
-
Burc Eryilmaz authored
Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
-
- 23 Jul, 2020 1 commit
-
-
Thor Johnsen authored
Asp sparse param dict update
-
- 22 Jul, 2020 3 commits
-
-
Asit authored
Accept custom (layer type:param name) to include in sparse_parameter …
-
Asit authored
1. Support to include in sparse_parameter_list an user-supplied custom layer type and its parameter name. This is useful when users have their own implementation of nn.Linear or nn.Conv2D. For example, huggingface repo has a custom implementation of nn.Linear called LinearActivation. 2. Print info of layers in the model that are not pruned.
-
Asit authored
Merge pull request #917 from a-maci/master
-
- 21 Jul, 2020 1 commit
-
-
Thor Johnsen authored
Fixing the case when grads are None
-
- 20 Jul, 2020 3 commits
- 16 Jul, 2020 2 commits
-
-
Thor Johnsen authored
Fixed weight init for fused weight matrices in fused MHA by adding correct gain factor
-
Thor Johnsen authored
Fixed variable name
-
- 09 Jul, 2020 1 commit
-
-
Szymon Migacz authored
-
- 06 Jul, 2020 1 commit
-
-
jjsjann123 authored
* [sync BN] support non-uniform batch size across process group. TODO: test should be added once cleaned up. * updating unit tests * new unit tests for different inputs * cleaning
-
- 01 Jul, 2020 1 commit
-
-
Kirthi Sivamani authored
-
- 30 Jun, 2020 1 commit
-
-
mcarilli authored
* Only attempt to patch Tensor methods if defined * syntax Co-authored-by:Michael Carilli <mcarilli@nvidia.com>
-
- 23 Jun, 2020 4 commits
- 15 Jun, 2020 1 commit
-
-
Thor Johnsen authored
2d masking and sparsity
-