- 23 Jun, 2022 1 commit
-
-
Tim Moon authored
* Increase default bucket size in distributed Adam * Move distributed Adam unit test to contrib tests Integrate into unit testing framework * Tweak hyperparameters for dist Adam optimizer test Improves numerical stability so we can keep tight tolerances. Adopting suggestions from @crcrpar. * Use distributed test infrastructure in distributed Adam unit test Suggestion from @crcrpar.
-
- 22 Jun, 2022 2 commits
-
-
Masaki Kozuki authored
* add temporary dispatch of double, float, half, bfloat16 * fusedadam of bfloat16 * Add bfloat16 path to FusedAdam
-
Tim Moon authored
* Gradient clipping routine with fused kernels Identical API as PyTorch. Falls back to PyTorch impl when not computing L2 norm. * Add unit test for gradient clipping * Add fp16 case to gradient clipping unit test * Tweaks to grad clipping unit test Review suggestions from @crcrpar * Debug gradient clipping tests When checking that incorrect results produce assertion errors, make sure to generate a discrepancy outside the range of numerical error.
-
- 16 Jun, 2022 1 commit
-
-
Kevin Stephano authored
Remove legacy fuser usage from multihead attention in contrib in favor of the default which should be nvfuser. Modify test scripts to activate fusion. (#1403)
-
- 14 Jun, 2022 3 commits
-
-
Thor Johnsen authored
ZeRO-2 support in DistributedFusedAdam
-
Tim Moon authored
Adjust test options to have tighter tolerances.
-
Tim Moon authored
-
- 13 Jun, 2022 1 commit
-
-
Tim Moon authored
-
- 31 May, 2022 1 commit
-
-
eqy authored
Do pipeline parallelism tests in double because TF32 environment variables can be painful to manage across test suites (#1391) * check in * skip interleaved with 2 GPU * change type annotation * address comments thanks @crcrpar @Aidyn-A
-
- 20 May, 2022 1 commit
-
-
Aidyn-A authored
* add grad check * change assert * minor changes * revert unnecessary changes * suggested changes * fix tensor comparison * small changes
-
- 19 May, 2022 2 commits
-
-
eqy authored
* check in * type * cleanup * cleanup * fix function call * Apply suggestions from code review Co-authored-by:Masaki Kozuki <mkozuki@nvidia.com>
-
eqy authored
* check in * fancy context style Co-authored-by:Masaki Kozuki <mkozuki@nvidia.com>
-
- 18 May, 2022 1 commit
-
-
Masaki Kozuki authored
* NcclDistributedTestBase * fix stupid mistake * add UCC test * add UCC backend * torch ucc tests * allows for UCC backend * Set `UCX_TLS` to `tcp,cuda_copy` & Use DDP iff it makes sense * Apply 4 suggestion(s) to 1 file(s) * mix&match NCCL & UCC * use both ucc&nccl in gpt * UCC for Pipeline Parallel, NCCL for the others * conditionally use ucc * make ucc guards more friendly * test raises when torch_ucc isn't available * Change to member variable from class variable Co-authored-by:
Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com> * pass async_comm to train, I mistakenly dropped it during the rebase * fix typo: functionality * Enable tensor parallel only when device count > 4 I want pipeline model parallel world size to be >= 4 because previously I saw GPT/BERT failing when only UCC is used. So I'm speculating that there's some gotcha around pipeline size of 4. * Add nvidia driver version guard Co-authored-by:
Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com> * move world_size as it was not correctly reflected * keep eye on the nvml api thing * import unittest Co-authored-by:
Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>
-
- 13 May, 2022 1 commit
-
-
Masaki Kozuki authored
-
- 12 May, 2022 1 commit
-
-
eqy authored
* initial check in * fix * fix test * address some review comments and cleanup * fix * bookmark * fix sync placement to come before gather * similar fix for non-gather case * add async bert * update gpt minimal test * allow selection of default pp test * fix bert test * cleanup * cleanup
-
- 11 May, 2022 1 commit
-
-
Aidyn-A authored
* add loss comparison to test_pipeline_parallel_fwd_bwd * applied some suggested changes * update test_pipeline_parallel_fwd_bwd.py * update test_pipeline_parallel_fwd_bwd.py 2 * minor update * update test_pipeline_parallel_fwd_bwd.py 3
-
- 29 Apr, 2022 3 commits
-
-
eqy authored
* fix typo * Update test_pipeline_parallel_fwd_bwd.py
-
Masaki Kozuki authored
This is cherry-picked for easier comparison with megatron-lm.
-
yjk21 authored
-
- 21 Apr, 2022 1 commit
-
-
Masaki Kozuki authored
* guard * update * remove unnecessary version guard * runtime version guard * cosmetic * skip tests appropriately
-
- 20 Apr, 2022 1 commit
-
-
Thor Johnsen authored
Peer memory halo exchange
-
- 19 Apr, 2022 1 commit
-
-
Masaki Kozuki authored
* bump version * add guard * fix the cond
-
- 14 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 13 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 08 Apr, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 07 Apr, 2022 2 commits
-
-
Masaki Kozuki authored
* add warning to pyprof * add warning to reparameterization note: this module is already not import-able as follows: ``` (base) root@c4bb3f161482:/vscode/apex# python -c 'import torch; import apex; from apex import reparameterization' /vscode/apex/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022 warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning) /vscode/apex/apex/reparameterization/__init__.py:2: FutureWarning: reparameterization will be removed by the end of June, 2022 warnings.warn("reparameterization will be removed by the end of June, 2022", FutureWarning) Traceback (most recent call last): File "<string>", line 1, in <module> File "/vscode/apex/apex/reparameterization/__init__.py", line 4, in <module> from .weight_norm import WeightNorm File "/vscode/apex/apex/reparameterization/weight_norm.py", line 3, in <module> from ..fp16_utils import Fused_Weight_Norm ImportError: cannot import name 'Fused_Weight_Norm' from 'apex.fp16_utils' (/vscode/apex/apex/fp16_utils/__init__.py) ``` -
Masaki Kozuki authored
* add test * destroy model parallel was missing
-
- 05 Apr, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 03 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 02 Apr, 2022 4 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 01 Apr, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 31 Mar, 2022 1 commit
-
-
Thor Johnsen authored
-