- 04 Mar, 2021 1 commit
-
-
Peng authored
Revert "pass all TensorListMetadata as pointer to pinned host memory (#13)
-
- 25 Feb, 2021 1 commit
-
-
Jeff Daily authored
This reverts commit bdd481d1.
-
- 25 Jan, 2021 1 commit
-
-
Jeff Daily authored
- incorrect use of __shfl_down - fix warp size assumptions - update unit tests to exit on failure
-
- 21 Jan, 2021 2 commits
-
-
Jeff Daily authored
-
Jeff Daily authored
use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
-
- 19 Jan, 2021 1 commit
-
-
Jeff Daily authored
IFU-2021-01-18
-
- 18 Jan, 2021 5 commits
-
-
Jeff Daily authored
-
Jeff Daily authored
-
Jeff Daily authored
Mostly whitespace or formatting issues addressed. Diff with upstream is reduced; ROCm changes are more clear.
-
Jeff Daily authored
Conflicts: csrc/multi_tensor_apply.cuh setup.py tests/L0/run_optimizers/test_adagrad.py tests/L0/run_optimizers/test_fused_optimizer.py tests/L0/run_optimizers/test_lamb.py
-
Jeff Daily authored
Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm
-
- 15 Jan, 2021 1 commit
-
-
Sarunya Pumma authored
-
- 31 Dec, 2020 3 commits
-
-
Chaitanya Sri Krishna Lolla authored
Skip the unit tests
-
lcskrishna authored
-
lcskrishna authored
-
- 17 Dec, 2020 3 commits
-
-
Thor Johnsen authored
Update ASP README to highlight default recipe
-
jpool-nv authored
The Recipe was presented after some non-standard API calls, so moving the suggested usage up, giving it its own section, and reinforcing the suggested usage in the non-standard section.
-
Chaitanya Sri Krishna Lolla authored
Hipify revamp changes for apex extensions on ROCm.
-
- 16 Dec, 2020 1 commit
-
-
lcskrishna authored
-
- 15 Dec, 2020 4 commits
-
-
lcskrishna authored
-
lcskrishna authored
-
lcskrishna authored
-
lcskrishna authored
-
- 10 Dec, 2020 1 commit
-
-
lcskrishna authored
-
- 09 Dec, 2020 2 commits
-
-
lcskrishna authored
-
lcskrishna authored
-
- 04 Dec, 2020 3 commits
-
-
Stas Bekman authored
-
Kexin Yu authored
* add flag for DistributedAdam: step_support_amp_scaling Co-authored-by:
Kexin Yu <kexiny@nvidia.com> Co-authored-by:
Kexin Yu <kexinznzn@gmail.com>
-
Burc Eryilmaz authored
* fuse dropout into softmax in fprop for additive mask case
-
- 02 Dec, 2020 1 commit
-
-
Janusz Lisiecki authored
- resume() is a nested function and when it loads best_prec1 it creates a local variable that hides the one from the parent function (which refers to the global one). This PR adds `global` to modify the global variable as intended Signed-off-by:Janusz Lisiecki <jlisiecki@nvidia.com>
-
- 01 Dec, 2020 1 commit
-
-
Kexin Yu authored
DistributedFusedAdam Model Parallelism Support (Megatron) Co-authored-by:
Kexin Yu <kexiny@nvidia.com> Co-authored-by:
Kexin Yu <kexinznzn@gmail.com>
-
- 04 Nov, 2020 1 commit
-
-
Ashish Farmer authored
* fix warp size in WARP_SHFL* in layernorm * enable fused_layer_norm tests on ROCm
-
- 19 Oct, 2020 1 commit
-
-
lly-zero-one authored
In this PR, we mainly tried to optimize the performance of Syncatchnorm and also fixed one potential issue in the welford_parallel kernel implementation. For performance improvement, we batched the mean/var/count all_gather communication together and sent it once in the forward path We also batch the all_reduce in backward path We add the contiguous call on the input of welford_parallel kernel. If there is any standard perf benchmark, I would be happy to run it.
-
- 29 Sep, 2020 1 commit
-
-
ptrblck authored
-
- 15 Sep, 2020 1 commit
-
-
Thor Johnsen authored
Update asp readme
-
- 14 Sep, 2020 2 commits
- 21 Aug, 2020 1 commit
-
-
Chaitanya Sri Krishna Lolla authored
-
- 18 Aug, 2020 1 commit
-
-
Chaitanya Sri Krishna Lolla authored
* enable deprecated fused adam optimizer * enable deprecated fused lamb * enable xentropy extension * add warpsize 32 for nv and 64 for amd * update compiler arguments * update the syncwarp conditions * update syncwarp condition
-
- 17 Aug, 2020 1 commit
-
-
Chaitanya Sri Krishna Lolla authored
* enable deprecated fused adam optimizer * enable deprecated fused lamb * reset the compiler arguments * syntax error * aligning the compiler arguments
-