- 06 Dec, 2022 2 commits
-
-
Hubert Lu authored
* Unskip some unit tests related to issue #82 * Ensure test_state_dict to use capturable=True for torch.optim.Adam * Fix TestFusedAdam tests in test_fused_optimizer.py
-
Hubert Lu authored
* Consider both contiguous and channel_last tensors for FusedSGD * Consider all the memory formats in fused_sgd * Add an unit test script for nhwc fused_sgd
-
- 10 Aug, 2022 1 commit
-
-
hubertlu-tw authored
-
- 08 Aug, 2022 1 commit
-
-
Hubert Lu authored
* Skip the failing unit tests from the FusedRMSNorm PR * Update test_lamb.py Co-authored-by:Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
-
- 23 Jun, 2022 1 commit
-
-
Tim Moon authored
* Increase default bucket size in distributed Adam * Move distributed Adam unit test to contrib tests Integrate into unit testing framework * Tweak hyperparameters for dist Adam optimizer test Improves numerical stability so we can keep tight tolerances. Adopting suggestions from @crcrpar. * Use distributed test infrastructure in distributed Adam unit test Suggestion from @crcrpar.
-
- 22 Jun, 2022 1 commit
-
-
Masaki Kozuki authored
* add temporary dispatch of double, float, half, bfloat16 * fusedadam of bfloat16 * Add bfloat16 path to FusedAdam
-
- 14 Jun, 2022 2 commits
- 14 Dec, 2021 1 commit
-
-
Hubert Lu authored
* Skip failing unit tests * Modify the test skipping messages
-
- 09 Dec, 2021 2 commits
-
-
Kevin Stephano authored
* Add fused mixed precision lamb optimizer. * Fix device usage in constructor. * Fix sending param_group tensor state to device. * Remove unneeded device set.
-
Kevin Stephano authored
* Add fused mixed precision lamb optimizer. * Fix device usage in constructor. * Fix sending param_group tensor state to device. * Remove unneeded device set.
-
- 19 Oct, 2021 1 commit
-
-
Hubert Lu authored
-
- 15 Apr, 2021 1 commit
-
-
Sudhakar Singh authored
* Add unit tests for fused-novograd * Fix: tensors should reside on the same device * Fix: Cudastream should be called on the same device on which the tensors reside on. Found this during debugging fused novograd multi-device unit test * fixed issues mentioned in the comments
-
- 21 Jan, 2021 1 commit
-
-
Jeff Daily authored
use __launch_bounds__(1024) for multi_tensor_apply, re-enable skipped tests
-
- 18 Jan, 2021 1 commit
-
-
Jeff Daily authored
-
- 31 Dec, 2020 2 commits
-
-
lcskrishna authored
-
lcskrishna authored
-
- 01 Dec, 2020 1 commit
-
-
Kexin Yu authored
DistributedFusedAdam Model Parallelism Support (Megatron) Co-authored-by:
Kexin Yu <kexiny@nvidia.com> Co-authored-by:
Kexin Yu <kexinznzn@gmail.com>
-
- 05 Aug, 2020 1 commit
-
-
ngimel authored
* add device guards to the optimizers * add untracked file * set deviceGuard in multi_tensor_apply * address review comments; fix lamb * indent * typo
-
- 07 Jul, 2020 1 commit
-
-
lcskrishna authored
-
- 23 Jun, 2020 3 commits
- 26 May, 2020 1 commit
-
-
rohithkrn authored
-
- 14 May, 2020 1 commit
-
-
Andrew Tulloch authored
-
- 03 Sep, 2019 1 commit
-
-
Deyu Fu authored
* move import of amp_C to __init__() * make fp16/32 separate lists to support mixed param types, disable double test * make zero_grad consistent between adam/novograd/lamb
-
- 17 Aug, 2019 1 commit
-
-
Deyu Fu authored
-
- 13 Aug, 2019 1 commit
-
-
Deyu Fu authored
FusedSGD now work as before FusedAdam now work with o1/o2, no longer fuse scaling and casting Removed special backend handling for FusedAdam Moved and updated test for FusedAdam into run_optimizers Removed legacy tests for optimizers.FP16_optimizer and FusedAdam in run_mixed_adam
-
- 12 Aug, 2019 1 commit
-
-
Deyu Fu authored
-