Commits · 57f890a70c7cf7d22b8e7349d577e4d5e9c3ba19 · OpenDAS / apex

23 Jun, 2022 1 commit

Move distributed Adam unit test to contrib dir (#1406) · 57f890a7

Tim Moon authored Jun 22, 2022

* Increase default bucket size in distributed Adam

* Move distributed Adam unit test to contrib tests

Integrate into unit testing framework

* Tweak hyperparameters for dist Adam optimizer test

Improves numerical stability so we can keep tight tolerances. Adopting suggestions from @crcrpar.

* Use distributed test infrastructure in distributed Adam unit test

Suggestion from @crcrpar.

57f890a7

22 Jun, 2022 2 commits

Temporary Solution to Let `FusedAdam` support BFloat16 (#1407) · 81f8ba79

Masaki Kozuki authored Jun 22, 2022

* add temporary dispatch of double, float, half, bfloat16

* fusedadam of bfloat16

* Add bfloat16 path to FusedAdam

81f8ba79

Gradient clipping with fused kernels (#1405) · dcb02fcf

Tim Moon authored Jun 21, 2022

* Gradient clipping routine with fused kernels

Identical API as PyTorch. Falls back to PyTorch impl when not computing L2 norm.

* Add unit test for gradient clipping

* Add fp16 case to gradient clipping unit test

* Tweaks to grad clipping unit test

Review suggestions from @crcrpar

* Debug gradient clipping tests

When checking that incorrect results produce assertion errors, make sure to generate a discrepancy outside the range of numerical error.

dcb02fcf

16 Jun, 2022 1 commit

Remove legacy fuser usage from multihead attention in contrib in favor of the... · 1403c21a

Kevin Stephano authored Jun 15, 2022

Remove legacy fuser usage from multihead attention in contrib in favor of the default which should be nvfuser.  Modify test scripts to activate fusion. (#1403)

1403c21a

14 Jun, 2022 3 commits
- Merge pull request #1401 from timmoon10/dist-adam-zero · 5ffb22d0
  Thor Johnsen authored Jun 14, 2022
```
ZeRO-2 support in DistributedFusedAdam
```
  5ffb22d0
- Update documentation to reflect DistributedFusedAdam uses AdamW · 846f7f8a
  Tim Moon authored Jun 14, 2022
```
Adjust test options to have tighter tolerances.
```
  846f7f8a
- Update dist Adam test to use updated API · e2af089c
  Tim Moon authored Jun 13, 2022
  
  e2af089c
13 Jun, 2022 1 commit
- Add ZeRO-2 support in DistributedFusedAdam · 6e412916
  Tim Moon authored Jun 13, 2022
  
  6e412916
31 May, 2022 1 commit

Do pipeline parallelism tests in double because TF32 environment variables can... · 265b451d

eqy authored May 31, 2022

Do pipeline parallelism tests in double because TF32 environment variables can be painful to manage across test suites (#1391)

* check in

* skip interleaved with 2 GPU

* change type annotation

* address comments thanks @crcrpar @Aidyn-A

265b451d

20 May, 2022 1 commit

Add grad check in test pipeline parallel fwd bwd (#1386) · ab5fc48f

Aidyn-A authored May 20, 2022

* add grad check

* change assert

* minor changes

* revert unnecessary changes

* suggested changes

* fix tensor comparison

* small changes

ab5fc48f

19 May, 2022 2 commits

Test `len(model) > 1` in `test_pipelining_with_interleaving` (#1384) · da1f7f2f

eqy authored May 18, 2022



* check in

* type

* cleanup

* cleanup

* fix function call

* Apply suggestions from code review
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

da1f7f2f

[Pipeline-Parallelism][TF32] Disable TF32 for Pipeline-Parallel numerical checks (#1382) · 891d57d3
eqy authored May 18, 2022
```
* check in

* fancy context style
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>
```
891d57d3

18 May, 2022 1 commit

[transformer] Allow for different backend for Pipeline Parallel ProcessGroups (#1380) · 3490b9e1

Masaki Kozuki authored May 18, 2022



* NcclDistributedTestBase

* fix stupid mistake

* add UCC test

* add UCC backend

* torch ucc tests

* allows for UCC backend

* Set `UCX_TLS` to `tcp,cuda_copy` & Use DDP iff it makes sense

* Apply 4 suggestion(s) to 1 file(s)

* mix&match NCCL & UCC

* use both ucc&nccl in gpt

* UCC for Pipeline Parallel, NCCL for the others

* conditionally use ucc

* make ucc guards more friendly

* test raises when torch_ucc isn't available

* Change to member variable from class variable
Co-authored-by: Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>

* pass async_comm to train, I mistakenly dropped it during the rebase

* fix typo: functionality

* Enable tensor parallel only when device count > 4

I want pipeline model parallel world size to be >= 4 because
previously I saw GPT/BERT failing when only UCC is used.
So I'm speculating that there's some gotcha around pipeline size of 4.

* Add nvidia driver version guard
Co-authored-by: Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>

* move world_size as it was not correctly reflected

* keep eye on the nvml api thing

* import unittest
Co-authored-by: Aidyn Aitzhan <31858918+Aidyn-A@users.noreply.github.com>

3490b9e1

13 May, 2022 1 commit
- Change the timing of deprecation warning (#1383) · d36397d2
  Masaki Kozuki authored May 13, 2022
  
  d36397d2
12 May, 2022 1 commit

Async pipeline parallel (#1373) · 3fe35211

eqy authored May 12, 2022

* initial check in

* fix

* fix test

* address some review comments and cleanup

* fix

* bookmark

* fix sync placement to come before gather

* similar fix for non-gather case

* add async bert

* update gpt minimal test

* allow selection of default pp test

* fix bert test

* cleanup

* cleanup

3fe35211

11 May, 2022 1 commit

[transformer] add loss comparison to test_pipeline_parallel_fwd_bwd (#1374) · 68440264

Aidyn-A authored May 11, 2022

* add loss comparison to test_pipeline_parallel_fwd_bwd

* applied some suggested changes

* update test_pipeline_parallel_fwd_bwd.py

* update test_pipeline_parallel_fwd_bwd.py 2

* minor update

* update test_pipeline_parallel_fwd_bwd.py 3

68440264

29 Apr, 2022 3 commits
- [transformer][pipeline parallel] fix typo in test (#1370) · c3018b13
  eqy authored Apr 29, 2022
```
* fix typo

* Update test_pipeline_parallel_fwd_bwd.py
```
  c3018b13
- [transformer][pipeline parallel] warn if deallocation is enabled (#1365) · 2b7d280b
  Masaki Kozuki authored Apr 29, 2022
```
This is cherry-picked for easier comparison with megatron-lm.
```
  2b7d280b
- [FastLayerNorm] Support hidden dim of 14336 (#1368) · 77f9d73c
  yjk21 authored Apr 29, 2022
  
  77f9d73c
21 Apr, 2022 1 commit

Give Some Extensions Version Guard in Build&Runtime (#1358) · f9305e75

Masaki Kozuki authored Apr 21, 2022

* guard

* update

* remove unnecessary version guard

* runtime version guard

* cosmetic

* skip tests appropriately

f9305e75

20 Apr, 2022 1 commit
- Merge pull request #1340 from NVIDIA/peer_memory · fed20d2a
  Thor Johnsen authored Apr 20, 2022
```
Peer memory halo exchange
```
  fed20d2a
19 Apr, 2022 1 commit
- [submodule update] Bump cudnn-frontend to v0.6.1 (#1353) · d89f5e66
  Masaki Kozuki authored Apr 18, 2022
```
* bump version

* add guard

* fix the cond
```
  d89f5e66
14 Apr, 2022 1 commit
- Bit faster · 5698eeeb
  Thor Johnsen authored Apr 14, 2022
  
  5698eeeb
13 Apr, 2022 1 commit
- Bug fixes · 140282d5
  Thor Johnsen authored Apr 12, 2022
  
  140282d5
08 Apr, 2022 3 commits
- Add graphing, switch to peer mem exchanger as default · bec558b1
  Thor Johnsen authored Apr 08, 2022
  
  bec558b1
- Bug fix · 4aeb24cb
  Thor Johnsen authored Apr 07, 2022
  
  4aeb24cb
- Fix deadlock issue when peer memory halo exchanger is used with cuda graph · c70f0e32
  Thor Johnsen authored Apr 07, 2022
  
  c70f0e32
07 Apr, 2022 2 commits

Deprecation warning: `pyprof` & `reparameterization` (#1348) · 727a6452

Masaki Kozuki authored Apr 07, 2022

* add warning to pyprof

* add warning to reparameterization

note: this module is already not import-able as follows:

```
(base) root@c4bb3f161482:/vscode/apex# python -c 'import torch; import
apex; from apex import reparameterization'
/vscode/apex/apex/pyprof/__init__.py:5: FutureWarning: pyprof will be
removed by the end of June, 2022
  warnings.warn("pyprof will be removed by the end of June, 2022",
FutureWarning)
/vscode/apex/apex/reparameterization/__init__.py:2: FutureWarning:
reparameterization will be removed by the end of June, 2022
  warnings.warn("reparameterization will be removed by the end of June,
2022", FutureWarning)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/vscode/apex/apex/reparameterization/__init__.py", line 4, in
<module>
    from .weight_norm import WeightNorm
  File "/vscode/apex/apex/reparameterization/weight_norm.py", line 3, in
<module>
    from ..fp16_utils import Fused_Weight_Norm
ImportError: cannot import name 'Fused_Weight_Norm' from
'apex.fp16_utils' (/vscode/apex/apex/fp16_utils/__init__.py)
```

727a6452

[transformer] add microbatches test (#1349) · 7d903878
Masaki Kozuki authored Apr 07, 2022
```
* add test

* destroy model parallel was missing
```
7d903878

05 Apr, 2022 2 commits
- Rename nccl_p2p extension to nccl_p2p_cuda · d8db8c15
  Thor Johnsen authored Apr 05, 2022
  
  d8db8c15
- Rename peer_memory extension to peer_memory_cuda · 6e7e2d90
  Thor Johnsen authored Apr 05, 2022
  
  6e7e2d90
03 Apr, 2022 1 commit
- Clean up code · fa8e7d99
  Thor Johnsen authored Apr 03, 2022
  
  fa8e7d99
02 Apr, 2022 4 commits
- Bug fix · 05dd9c69
  Thor Johnsen authored Apr 01, 2022
  
  05dd9c69
- Bug fix · a5d51c01
  Thor Johnsen authored Apr 01, 2022
  
  a5d51c01
- Bug fix · 8b6f8fc1
  Thor Johnsen authored Apr 01, 2022
  
  8b6f8fc1
- Remove unused field · 64b93e3e
  Thor Johnsen authored Apr 01, 2022
  
  64b93e3e
01 Apr, 2022 3 commits
- Add halo correction kernel for bprop · 88914a50
  Thor Johnsen authored Apr 01, 2022
  
  88914a50
- Fix halo correction kernel · 705aa35d
  Thor Johnsen authored Apr 01, 2022
  
  705aa35d
- Add halo correction using new cudnn masking feature · 60000f73
  Thor Johnsen authored Mar 31, 2022
  
  60000f73
31 Mar, 2022 1 commit
- Bug fixes · 9c16d945
  Thor Johnsen authored Mar 31, 2022
  
  9c16d945