- 19 May, 2022 2 commits
-
-
eqy authored
* check in * type * cleanup * cleanup * fix function call * Apply suggestions from code review Co-authored-by:Masaki Kozuki <mkozuki@nvidia.com>
-
eqy authored
* check in * fancy context style Co-authored-by:Masaki Kozuki <mkozuki@nvidia.com>
-
- 18 May, 2022 1 commit
-
-
Masaki Kozuki authored
* NcclDistributedTestBase * fix stupid mistake * add UCC test * add UCC backend * torch ucc tests * allows for UCC backend * Set `UCX_TLS` to `tcp,cuda_copy` & Use DDP iff it makes sense * Apply 4 suggestion(s) to 1 file(s) * mix&match NCCL & UCC * use both ucc&nccl in gpt * UCC for Pipeline Parallel, NCCL for the others * conditionally use ucc * make ucc guards more friendly * test raises when torch_ucc isn't available * Change to member variable from class variable Co-authored-by:
Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com> * pass async_comm to train, I mistakenly dropped it during the rebase * fix typo: functionality * Enable tensor parallel only when device count > 4 I want pipeline model parallel world size to be >= 4 because previously I saw GPT/BERT failing when only UCC is used. So I'm speculating that there's some gotcha around pipeline size of 4. * Add nvidia driver version guard Co-authored-by:
Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com> * move world_size as it was not correctly reflected * keep eye on the nvml api thing * import unittest Co-authored-by:
Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>
-
- 13 May, 2022 1 commit
-
-
Masaki Kozuki authored
-
- 12 May, 2022 1 commit
-
-
eqy authored
* initial check in * fix * fix test * address some review comments and cleanup * fix * bookmark * fix sync placement to come before gather * similar fix for non-gather case * add async bert * update gpt minimal test * allow selection of default pp test * fix bert test * cleanup * cleanup
-
- 11 May, 2022 1 commit
-
-
Aidyn-A authored
* add loss comparison to test_pipeline_parallel_fwd_bwd * applied some suggested changes * update test_pipeline_parallel_fwd_bwd.py * update test_pipeline_parallel_fwd_bwd.py 2 * minor update * update test_pipeline_parallel_fwd_bwd.py 3
-
- 29 Apr, 2022 3 commits
-
-
eqy authored
* fix typo * Update test_pipeline_parallel_fwd_bwd.py
-
Masaki Kozuki authored
This is cherry-picked for easier comparison with megatron-lm.
-
yjk21 authored
-
- 21 Apr, 2022 1 commit
-
-
Masaki Kozuki authored
* guard * update * remove unnecessary version guard * runtime version guard * cosmetic * skip tests appropriately
-
- 20 Apr, 2022 1 commit
-
-
Thor Johnsen authored
Peer memory halo exchange
-
- 19 Apr, 2022 1 commit
-
-
Masaki Kozuki authored
* bump version * add guard * fix the cond
-
- 14 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 13 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 08 Apr, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 07 Apr, 2022 2 commits
-
-
Masaki Kozuki authored
* add warning to pyprof * add warning to reparameterization note: this module is already not import-able as follows: ``` (base) root@c4bb3f161482:/vscode/apex# python -c 'import torch; import apex; from apex import reparameterization' /vscode/apex/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022 warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning) /vscode/apex/apex/reparameterization/__init__.py:2: FutureWarning: reparameterization will be removed by the end of June, 2022 warnings.warn("reparameterization will be removed by the end of June, 2022", FutureWarning) Traceback (most recent call last): File "<string>", line 1, in <module> File "/vscode/apex/apex/reparameterization/__init__.py", line 4, in <module> from .weight_norm import WeightNorm File "/vscode/apex/apex/reparameterization/weight_norm.py", line 3, in <module> from ..fp16_utils import Fused_Weight_Norm ImportError: cannot import name 'Fused_Weight_Norm' from 'apex.fp16_utils' (/vscode/apex/apex/fp16_utils/__init__.py) ``` -
Masaki Kozuki authored
* add test * destroy model parallel was missing
-
- 05 Apr, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 03 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 02 Apr, 2022 4 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 01 Apr, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 31 Mar, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 30 Mar, 2022 2 commits
-
-
Gil Shomron authored
* Enabled Conv-Bias-ReLU fusion The following modules are enabled using cuDNN runtime fusion: 1) Conv-Bias-ReLU (+backward) 2) Conv-Bias (+backward) 3) Conv-Bias-Mask-ReLU (+backward) * Casts cleanup and autocast in unittest - Remove redundant dtype casts - Simulate the usage in the unittest by using torch.cuda.amp.autocast Co-authored-by:
Masaki Kozuki <mkozuki@nvidia.com> * Fixed save_for_backward Co-authored-by:
Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by:
root <root@luna-0277.selene.nvidia.com>
-
Thor Johnsen authored
-
- 29 Mar, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 28 Mar, 2022 1 commit
-
-
Thor Johnsen authored
-
- 25 Mar, 2022 3 commits
-
-
yjk21 authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-