"vscode:/vscode.git/clone" did not exist on "50b495f3d82097a6ed5f6138f92b6006e5471884"
- 31 Aug, 2021 2 commits
-
-
Jithun Nair authored
add distributed fused lamb
-
Jeff Daily authored
-
- 25 Jun, 2021 2 commits
-
-
Jeff Daily authored
Make torch version check numeric
-
Jithun Nair authored
-
- 04 Mar, 2021 3 commits
-
-
Jeff Daily authored
IFU-2020-03-04
-
Jeff Daily authored
-
Peng authored
Revert "pass all TensorListMetadata as pointer to pinned host memory (#13)
-
- 25 Feb, 2021 1 commit
-
-
Jeff Daily authored
This reverts commit bdd481d1.
-
- 23 Feb, 2021 1 commit
-
-
yjk21 authored
-
- 10 Feb, 2021 1 commit
-
-
Shoufa Chen authored
* copy-paste friendly * fix import container_abcs issue Nightly PyTorch has removed `container_abcs` from `torch._six`. https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35 * fix import container_abcs issue Nightly PyTorch has removed `container_abcs` from `torch._six`. https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35 * keep existing for pytorch1.7 and earlier
-
- 25 Jan, 2021 1 commit
-
-
Jeff Daily authored
- incorrect use of __shfl_down - fix warp size assumptions - update unit tests to exit on failure
-
- 21 Jan, 2021 2 commits
-
-
Jeff Daily authored
-
Jeff Daily authored
use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
-
- 20 Jan, 2021 1 commit
-
-
Burc Eryilmaz authored
Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
-
- 19 Jan, 2021 1 commit
-
-
Jeff Daily authored
IFU-2021-01-18
-
- 18 Jan, 2021 5 commits
-
-
Jeff Daily authored
-
Jeff Daily authored
-
Jeff Daily authored
Mostly whitespace or formatting issues addressed. Diff with upstream is reduced; ROCm changes are more clear.
-
Jeff Daily authored
Conflicts: csrc/multi_tensor_apply.cuh setup.py tests/L0/run_optimizers/test_adagrad.py tests/L0/run_optimizers/test_fused_optimizer.py tests/L0/run_optimizers/test_lamb.py
-
Jeff Daily authored
Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm
-
- 15 Jan, 2021 1 commit
-
-
Sarunya Pumma authored
-
- 31 Dec, 2020 3 commits
-
-
Chaitanya Sri Krishna Lolla authored
Skip the unit tests
-
lcskrishna authored
-
lcskrishna authored
-
- 17 Dec, 2020 3 commits
-
-
Thor Johnsen authored
Update ASP README to highlight default recipe
-
jpool-nv authored
The Recipe was presented after some non-standard API calls, so moving the suggested usage up, giving it its own section, and reinforcing the suggested usage in the non-standard section.
-
Chaitanya Sri Krishna Lolla authored
Hipify revamp changes for apex extensions on ROCm.
-
- 16 Dec, 2020 1 commit
-
-
lcskrishna authored
-
- 15 Dec, 2020 4 commits
-
-
lcskrishna authored
-
lcskrishna authored
-
lcskrishna authored
-
lcskrishna authored
-
- 10 Dec, 2020 1 commit
-
-
lcskrishna authored
-
- 09 Dec, 2020 2 commits
-
-
lcskrishna authored
-
lcskrishna authored
-
- 04 Dec, 2020 3 commits
-
-
Stas Bekman authored
-
Kexin Yu authored
* add flag for DistributedAdam: step_support_amp_scaling Co-authored-by:
Kexin Yu <kexiny@nvidia.com> Co-authored-by:
Kexin Yu <kexinznzn@gmail.com>
-
Burc Eryilmaz authored
* fuse dropout into softmax in fprop for additive mask case
-
- 02 Dec, 2020 1 commit
-
-
Janusz Lisiecki authored
- resume() is a nested function and when it loads best_prec1 it creates a local variable that hides the one from the parent function (which refers to the global one). This PR adds `global` to modify the global variable as intended Signed-off-by:Janusz Lisiecki <jlisiecki@nvidia.com>
-
- 01 Dec, 2020 1 commit
-
-
Kexin Yu authored
DistributedFusedAdam Model Parallelism Support (Megatron) Co-authored-by:
Kexin Yu <kexiny@nvidia.com> Co-authored-by:
Kexin Yu <kexinznzn@gmail.com>
-