- 20 Apr, 2022 1 commit
-
-
Thor Johnsen authored
Peer memory halo exchange
-
- 19 Apr, 2022 1 commit
-
-
Masaki Kozuki authored
* bump version * add guard * fix the cond
-
- 14 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 13 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 08 Apr, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 07 Apr, 2022 2 commits
-
-
Masaki Kozuki authored
* add warning to pyprof * add warning to reparameterization note: this module is already not import-able as follows: ``` (base) root@c4bb3f161482:/vscode/apex# python -c 'import torch; import apex; from apex import reparameterization' /vscode/apex/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be removed by the end of June, 2022 warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning) /vscode/apex/apex/reparameterization/__init__.py:2: FutureWarning: reparameterization will be removed by the end of June, 2022 warnings.warn("reparameterization will be removed by the end of June, 2022", FutureWarning) Traceback (most recent call last): File "<string>", line 1, in <module> File "/vscode/apex/apex/reparameterization/__init__.py", line 4, in <module> from .weight_norm import WeightNorm File "/vscode/apex/apex/reparameterization/weight_norm.py", line 3, in <module> from ..fp16_utils import Fused_Weight_Norm ImportError: cannot import name 'Fused_Weight_Norm' from 'apex.fp16_utils' (/vscode/apex/apex/fp16_utils/__init__.py) ``` -
Masaki Kozuki authored
* add test * destroy model parallel was missing
-
- 05 Apr, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 03 Apr, 2022 1 commit
-
-
Thor Johnsen authored
-
- 02 Apr, 2022 4 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 01 Apr, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 31 Mar, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 30 Mar, 2022 2 commits
-
-
Gil Shomron authored
* Enabled Conv-Bias-ReLU fusion The following modules are enabled using cuDNN runtime fusion: 1) Conv-Bias-ReLU (+backward) 2) Conv-Bias (+backward) 3) Conv-Bias-Mask-ReLU (+backward) * Casts cleanup and autocast in unittest - Remove redundant dtype casts - Simulate the usage in the unittest by using torch.cuda.amp.autocast Co-authored-by:
Masaki Kozuki <mkozuki@nvidia.com> * Fixed save_for_backward Co-authored-by:
Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by:
root <root@luna-0277.selene.nvidia.com>
-
Thor Johnsen authored
-
- 29 Mar, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 28 Mar, 2022 1 commit
-
-
Thor Johnsen authored
-
- 25 Mar, 2022 7 commits
-
-
yjk21 authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Masaki Kozuki authored
* try PyTorch custom TestCase class * revert * initial working example * update * data utils * fix imports * hardcode backend to nccl * fix signature * fix typo * mapping * set device * init * refactor x entropy * remove unused import & destroy model parallel * refactor random * fix test * remove migrated tests * refactor * init * separate affine weight init * init model parallel * split more * weight init fix part 1 * use cpu init for consistency btwn native and tensor parallel * black * add col parallel * use a 3D tensor of square matrix for column parallel linear * skip the failing cases * migrate layers test * pipeline parallel forward/backward * fix typo * fix typo * fix * fix pipeline world size * black * rm `run_pipeline_parallel_test` in favor of test_pipeline_parallel_fwd_bwd.py * stop logging * set log level * black * license and format * fix * skip tf32 as matrices are small * remove potentially inappropriate license * Apply suggestions from code review * remove `TODO` comment * `torch.testing.assert_allclose` -> `torch.testing.assert_close` * remove comment-outs * remote unused import * minor fix
-
Masaki Kozuki authored
* update * Add comment to `destroy_model_parallel`
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 24 Mar, 2022 4 commits
-
-
Thor Johnsen authored
-
Masaki Kozuki authored
Take-over of #1097 * Add fast CUDA focal loss implementation. * Enable fast math for CUDA focal loss. * Correct typo. * replace deprecated macros * Add fast CUDA focal loss implementation. * Enable fast math for CUDA focal loss. * Correct typo. * replace deprecated macros * TORCH_CUDA_CHECK -> AT_CUDA_CHECK The former is defined in torch/csrc/profiler/cuda.cpp so it's not available usually. The latter however is defined in ATen/cuda/Exceptions.h as an alias of C10_CUDA_CHECK. * add test * clean up * guard for torchvision Co-authored-by:Wil Kong <alpha0422@gmail.com>
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 23 Mar, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-