Commits · da1f7f2fd58780060e85760fb70f212fd418dde6 · OpenDAS / apex

19 May, 2022 2 commits

Test `len(model) > 1` in `test_pipelining_with_interleaving` (#1384) · da1f7f2f

eqy authored May 18, 2022



* check in

* type

* cleanup

* cleanup

* fix function call

* Apply suggestions from code review
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

da1f7f2f

[Pipeline-Parallelism][TF32] Disable TF32 for Pipeline-Parallel numerical checks (#1382) · 891d57d3
eqy authored May 18, 2022
```
* check in

* fancy context style
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>
```
891d57d3

18 May, 2022 1 commit

[transformer] Allow for different backend for Pipeline Parallel ProcessGroups (#1380) · 3490b9e1

Masaki Kozuki authored May 18, 2022



* NcclDistributedTestBase

* fix stupid mistake

* add UCC test

* add UCC backend

* torch ucc tests

* allows for UCC backend

* Set `UCX_TLS` to `tcp,cuda_copy` & Use DDP iff it makes sense

* Apply 4 suggestion(s) to 1 file(s)

* mix&match NCCL & UCC

* use both ucc&nccl in gpt

* UCC for Pipeline Parallel, NCCL for the others

* conditionally use ucc

* make ucc guards more friendly

* test raises when torch_ucc isn't available

* Change to member variable from class variable
Co-authored-by: Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>

* pass async_comm to train, I mistakenly dropped it during the rebase

* fix typo: functionality

* Enable tensor parallel only when device count > 4

I want pipeline model parallel world size to be >= 4 because
previously I saw GPT/BERT failing when only UCC is used.
So I'm speculating that there's some gotcha around pipeline size of 4.

* Add nvidia driver version guard
Co-authored-by: Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>

* move world_size as it was not correctly reflected

* keep eye on the nvml api thing

* import unittest
Co-authored-by: Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>

3490b9e1

13 May, 2022 1 commit
- Change the timing of deprecation warning (#1383) · d36397d2
  Masaki Kozuki authored May 13, 2022
  
  d36397d2
12 May, 2022 1 commit

Async pipeline parallel (#1373) · 3fe35211

eqy authored May 12, 2022

* initial check in

* fix

* fix test

* address some review comments and cleanup

* fix

* bookmark

* fix sync placement to come before gather

* similar fix for non-gather case

* add async bert

* update gpt minimal test

* allow selection of default pp test

* fix bert test

* cleanup

* cleanup

3fe35211

11 May, 2022 1 commit

[transformer] add loss comparison to test_pipeline_parallel_fwd_bwd (#1374) · 68440264

Aidyn-A authored May 11, 2022

* add loss comparison to test_pipeline_parallel_fwd_bwd

* applied some suggested changes

* update test_pipeline_parallel_fwd_bwd.py

* update test_pipeline_parallel_fwd_bwd.py 2

* minor update

* update test_pipeline_parallel_fwd_bwd.py 3

68440264

29 Apr, 2022 3 commits
- [transformer][pipeline parallel] fix typo in test (#1370) · c3018b13
  eqy authored Apr 29, 2022
```
* fix typo

* Update test_pipeline_parallel_fwd_bwd.py
```
  c3018b13
- [transformer][pipeline parallel] warn if deallocation is enabled (#1365) · 2b7d280b
  Masaki Kozuki authored Apr 29, 2022
```
This is cherry-picked for easier comparison with megatron-lm.
```
  2b7d280b
- [FastLayerNorm] Support hidden dim of 14336 (#1368) · 77f9d73c
  yjk21 authored Apr 29, 2022
  
  77f9d73c
21 Apr, 2022 1 commit

Give Some Extensions Version Guard in Build&Runtime (#1358) · f9305e75

Masaki Kozuki authored Apr 21, 2022

* guard

* update

* remove unnecessary version guard

* runtime version guard

* cosmetic

* skip tests appropriately

f9305e75

20 Apr, 2022 1 commit
- Merge pull request #1340 from NVIDIA/peer_memory · fed20d2a
  Thor Johnsen authored Apr 20, 2022
```
Peer memory halo exchange
```
  fed20d2a
19 Apr, 2022 1 commit
- [submodule update] Bump cudnn-frontend to v0.6.1 (#1353) · d89f5e66
  Masaki Kozuki authored Apr 18, 2022
```
* bump version

* add guard

* fix the cond
```
  d89f5e66
14 Apr, 2022 1 commit
- Bit faster · 5698eeeb
  Thor Johnsen authored Apr 14, 2022
  
  5698eeeb
13 Apr, 2022 1 commit
- Bug fixes · 140282d5
  Thor Johnsen authored Apr 12, 2022
  
  140282d5
08 Apr, 2022 3 commits
- Add graphing, switch to peer mem exchanger as default · bec558b1
  Thor Johnsen authored Apr 08, 2022
  
  bec558b1
- Bug fix · 4aeb24cb
  Thor Johnsen authored Apr 07, 2022
  
  4aeb24cb
- Fix deadlock issue when peer memory halo exchanger is used with cuda graph · c70f0e32
  Thor Johnsen authored Apr 07, 2022
  
  c70f0e32
07 Apr, 2022 2 commits

Deprecation warning: `pyprof` & `reparameterization` (#1348) · 727a6452

Masaki Kozuki authored Apr 07, 2022

* add warning to pyprof

* add warning to reparameterization

note: this module is already not import-able as follows:

```
(base) root@c4bb3f161482:/vscode/apex# python -c 'import torch; import
apex; from apex import reparameterization'
/vscode/apex/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be
removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022",
FutureWarning)
/vscode/apex/apex/reparameterization/__init__.py:2: FutureWarning:
reparameterization will be removed by the end of June, 2022
  warnings.warn("reparameterization will be removed by the end of June,
2022", FutureWarning)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/vscode/apex/apex/reparameterization/__init__.py", line 4, in
<module>
    from .weight_norm import WeightNorm
  File "/vscode/apex/apex/reparameterization/weight_norm.py", line 3, in
<module>
    from ..fp16_utils import Fused_Weight_Norm
ImportError: cannot import name 'Fused_Weight_Norm' from
'apex.fp16_utils' (/vscode/apex/apex/fp16_utils/__init__.py)
```

727a6452

[transformer] add microbatches test (#1349) · 7d903878
Masaki Kozuki authored Apr 07, 2022
```
* add test

* destroy model parallel was missing
```
7d903878

05 Apr, 2022 2 commits
- Rename nccl_p2p extension to nccl_p2p_cuda · d8db8c15
  Thor Johnsen authored Apr 05, 2022
  
  d8db8c15
- Rename peer_memory extension to peer_memory_cuda · 6e7e2d90
  Thor Johnsen authored Apr 05, 2022
  
  6e7e2d90
03 Apr, 2022 1 commit
- Clean up code · fa8e7d99
  Thor Johnsen authored Apr 03, 2022
  
  fa8e7d99
02 Apr, 2022 4 commits
- Bug fix · 05dd9c69
  Thor Johnsen authored Apr 01, 2022
  
  05dd9c69
- Bug fix · a5d51c01
  Thor Johnsen authored Apr 01, 2022
  
  a5d51c01
- Bug fix · 8b6f8fc1
  Thor Johnsen authored Apr 01, 2022
  
  8b6f8fc1
- Remove unused field · 64b93e3e
  Thor Johnsen authored Apr 01, 2022
  
  64b93e3e
01 Apr, 2022 3 commits
- Add halo correction kernel for bprop · 88914a50
  Thor Johnsen authored Apr 01, 2022
  
  88914a50
- Fix halo correction kernel · 705aa35d
  Thor Johnsen authored Apr 01, 2022
  
  705aa35d
- Add halo correction using new cudnn masking feature · 60000f73
  Thor Johnsen authored Mar 31, 2022
  
  60000f73
31 Mar, 2022 3 commits
- Bug fixes · 9c16d945
  Thor Johnsen authored Mar 31, 2022
  
  9c16d945
- Some fixes to better support native nhwc · 0c20c455
  Thor Johnsen authored Mar 31, 2022
  
  0c20c455
- wgrad2 in parallel stream, optional mode to wait for halo transfer · 34df0f79
  Thor Johnsen authored Mar 31, 2022
  
  34df0f79
30 Mar, 2022 2 commits

Conv-Bias-ReLU fusion (#1332) · 23cfb576

Gil Shomron authored Mar 30, 2022



* Enabled Conv-Bias-ReLU fusion

The following modules are enabled using cuDNN runtime fusion:
1) Conv-Bias-ReLU (+backward)
2) Conv-Bias (+backward)
3) Conv-Bias-Mask-ReLU (+backward)

* Casts cleanup and autocast in unittest

- Remove redundant dtype casts
- Simulate the usage in the unittest by using torch.cuda.amp.autocast
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

* Fixed save_for_backward
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>
Co-authored-by: root <root@luna-0277.selene.nvidia.com>

23cfb576

Concatenate out1 with halos for backward · 834b1d01
Thor Johnsen authored Mar 29, 2022

834b1d01

29 Mar, 2022 2 commits
- Module test improvements, bug fixes · e5d0be82
  Thor Johnsen authored Mar 29, 2022
  
  e5d0be82
- Add some debug prints · d925763a
  Thor Johnsen authored Mar 29, 2022
  
  d925763a
28 Mar, 2022 1 commit
- Bug fix · aff81e54
  Thor Johnsen authored Mar 28, 2022
  
  aff81e54
25 Mar, 2022 3 commits
- update fmha (#1344) · 3c88451a
  yjk21 authored Mar 25, 2022
  
  3c88451a
- Forgot · cd8db094
  Thor Johnsen authored Mar 25, 2022
  
  cd8db094
- Optional inplace halo exchange · b41c68b3
  Thor Johnsen authored Mar 25, 2022
  
  b41c68b3