Commits · a0cabb717c7f5d51ff8f0986ea38b49346d7c1de · OpenDAS / TransformerEngine

17 Apr, 2025 1 commit

[QA] Add XML log generation for pytest results (#1661) · a0cabb71

linxiddd authored Apr 17, 2025



* [QA] Add error handling

- Standardize test failure handling using the unified 'test_fail' function and 'error_exit' function
Signed-off-by: Linxi Ding <linxid@nvidia.com>

* Add XML log generation for pytest results

- Add `--junitxml` option to pytest command to generate JUnit XML format logs
Signed-off-by: Linxi Ding <linxid@nvidia.com>

* Add $XML_LOG_DIR
Signed-off-by: Linxi Ding <linxid@nvidia.com>

* mkdir
Signed-off-by: Linxi Ding <linxid@nvidia.com>

* Update qa/L0_pytorch_unittest/test.sh
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

---------
Signed-off-by: Linxi Ding <linxid@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

a0cabb71

22 Mar, 2025 1 commit

[PyTorch] Enable fp8_primary_weights for current scaling (#1544) · 86813893

Kunlun Li authored Mar 22, 2025



* Enable fp8_primary_weights for current scaling
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Use different cast_master_weights_to_fp8 functions depending on the type of quantizer
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* All amaxes of model_weights should participate in reduce-max
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Clear _high_precision_init_val automatically in cast_master_weights_to_fp8 function
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Merge all all-reduce on amaxes into one NCCL kernel
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Add unit tests for multi_tensor_compute_scale_and_scale_inv and preserve_high_precision_init_val
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Fix conflicts
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Add unit test for cast_master_weights_to_fp8
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* use mock group to initialize fp8_autocast to avoid reduction of amax_history by fp8_autocast_exit
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Remove with_computing_amax and with_computing_scale
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Move replace_raw_data from QuantizedTensor to utils.py
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Remove allow_empty_output argument from nvte_compute_amax and set it always be true
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Rename import guard of recipe_common.cuh to be align with other import guards
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Add unit test for replace_raw_data
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Add test_replace_raw_data into qa/L0_pytorch_unittest/test.sh
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Minor changes in comments
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Add randomness to the unit test of replace_raw_data
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* (Maybe need revert) Add tex.quantize_to_fragment
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* (Maybe needsto rrevert) Use nvte_quantize_noop in quantize_to_fragment
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix lint error
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Move high_precision_init_val test and replace_raw_data test to test_sanity.py
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Remove test_fp8_model_init.py and test_replace_raw_data.py
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Remove cast_master_weights_to_fp8 and replace_raw_data from __all__ of tensor.__init__.py
Signed-off-by: kunlunl <kunlunl@nvidia.com>

* Move FP8 casting logic back from C++ tex funcs to Python
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Remove unimplemented function from header
Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------
Signed-off-by: kunlunl <kunlunl@nvidia.com>
Signed-off-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>

86813893

17 Mar, 2025 1 commit

[QA] Add error handling (#1570) · c571c2fd

linxiddd authored Mar 18, 2025



* [QA] Add error handling

-Standardize test failure handling using the unified 'test_fail' function and 'error_exit' function.
Signed-off-by: Linxi Ding <linxid@nvidia.com>

* Update script to use explicit python3, pip3, and python3 -m pytest calls

- Change pip to pip3.
- Change python to python3.
- Change pytest to python3 -m pytest.
Signed-off-by: Linxi Ding <linxid@nvidia.com>

---------
Signed-off-by: Linxi Ding <linxid@nvidia.com>

c571c2fd

13 Mar, 2025 1 commit

Explicitly use `python3` and `pip3` executables (#1486) · 31f32b37

Tim Moon authored Mar 12, 2025



* Explicitly use python3 and pip3
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Run pre-commit as Python module
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Replace some missed references to "python" or "pip"
Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

31f32b37

24 Feb, 2025 1 commit

[PyTorch] Run all Python tests, even if one of them fails · 229dd045

Paweł Gadziński authored Feb 24, 2025



* non-exit tests
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

229dd045

07 Feb, 2025 1 commit
- Update main branch with TE 2.0 code, update version to 2.1.0.dev0 · 544dd14b
  Przemek Tredak authored Feb 07, 2025
```
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
```
  544dd14b
02 Jan, 2025 1 commit
- Update copyright to include 2025 (#1388) · c9ea6be9
  Kirthi Shankar Sivamani authored Jan 02, 2025
```
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  c9ea6be9
16 Dec, 2024 1 commit

Enabling FP8 all-gather for TE Float8Tensor when using Torch FSDP2 (#1358) · 0196ed44

Youngeun Kwon authored Dec 16, 2024



* draft implementation of fsdp2 fp8 all gather
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* fix the convergence issue
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* Add warning
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* disable lint error
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix the lint error
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* fix lint error
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix lint error
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix lint error
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* add comments
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* add ref
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* add related tests
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

0196ed44

06 Nov, 2024 1 commit

[PyTorch] Userbuffers support in operation-based API (#1142) · 095b27d0

Tim Moon authored Nov 05, 2024



* Add Userbuffers support for column TP linear layer
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add Userbuffers support for row TP linear layer
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Interpret linear+RS as row TP linear
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add Userbuffers support for FP8 row TP linear layer

Assumes FP8 RS, which is not a good assumption.
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Debug bug with incorrect bias pointers in UB GEMM

Bias pointers are not properly offset for different data chunks. Also removed logic for FP8 RS.
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add Userbuffers support for linear dgrad

Test passes with row TP, fails with col TP.
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add Userbuffers support for linear wgrad
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add support for grad bias
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fused cast-transpose-dbias
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Support case where wgrad is optional
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Expand documentation
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix linter warnings
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Use recently added convenience functions in Float8Tensor
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Respect autograd dtype
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix missing imports
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Respect PyT autocast dtype in bprop
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix linter warnings
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Debug merge conflicts
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

095b27d0

18 Oct, 2024 1 commit

[PyTorch] Reorganize L1 tests (#1255) · 41fe1e50

Tim Moon authored Oct 17, 2024



* Reorganize PyTorch L1 tests
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Move ONNX tests to L1
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Move FA version test to L3
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Limit parallel build jobs in FA version test
Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------
Signed-off-by: Tim Moon <tmoon@nvidia.com>

41fe1e50

01 Oct, 2024 1 commit
- [PyTorch] Fix distributed testing (#1219) · 46075b98
  Kirthi Shankar Sivamani authored Sep 30, 2024
```
Fix
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  46075b98
03 Sep, 2024 1 commit

Improvements for building wheels (#1148) · 93f00a79

Kirthi Shankar Sivamani authored Sep 03, 2024



* Improvements for wheels
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixes for wheel build
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Move package finder to common
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* format
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Lint
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* FIx
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix CI and distributed test
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix paddle ci
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

93f00a79

09 Aug, 2024 1 commit

[C/PyTorch] Fixed incorrect use of `torch.distributed.new_group()` when... · fa4b866d

Alp Dener authored Aug 09, 2024


[C/PyTorch] Fixed incorrect use of `torch.distributed.new_group()` when creating intra-node group in `initialize_ub()` (#1087)

* updated initialize_ub() to use new_subgroups_by_enumeration() to generate intra-node groups, added new unit tests for TE layers with comm overlap
Signed-off-by: Alp Dener <adener@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Alp Dener <adener@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

fa4b866d

03 Jan, 2024 1 commit
- Change the copyright to include 2024 (#583) · cd798c97
  Przemyslaw Tredak authored Jan 02, 2024
```
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
```
  cd798c97
07 Dec, 2023 1 commit

Integrate cuDNN frontend v1 to fused attention (#497) · 32db3928

cyanguwa authored Dec 07, 2023



* Integrate cuDNN frontend v1 to fused attention and miscellaneous fixes
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix lint
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix jax/paddle for unit tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix jax/pytorch lint
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* simplify stride generation
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix and/or logic in get_backend
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix flag_max512 and test_numerics
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove v.contiguous() since get_qkv_layout covers it
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* skip fp8 tests for sm89
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* further fix jax CI
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix jax CI
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* revert mask type to comma-separated list
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix lint
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix last two commits
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* integrate v1/pre-release-5
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* cleanup prerelease5 integration and fix FA2.1 commit
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* force dropout to 0 if not training
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix Jax CI
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* testing bias/alibi and padding+causal; add alibi to unfused DPA
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* set flag_arb to false when non determinism is not allowed
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* followup on prev commit; remove redundant python env var setting
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: minor tweaks for tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* prepare for tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix determinism logic for fused attn
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add bias to bwd
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix gpt_checkpointing/dpa_accuracy problem
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix some seg fault issues
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add failure notes
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove use of non-deter var for backend selection
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fix for lint and CI
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix workspace size in bwd and uncomment bias test
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix get_alibi and remove check_support
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* update tests status
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove workspace_opt from FADescriptor_v1
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* disable arbitrary backend + post scale bias in Jax; waiting on PR 525
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* clean up bhsd order
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* swap bias/rng_state order in aux_ctx_tensor and add bias to aux_ctx_tensor in _qkvpacked/_kvpacked API
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove support for padding_causal + cross for max512
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* change alibi bias to float32 for bias_1_4/5 tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* further clean up tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix thd fwd output shape for FlashAttention and add backend info for DPA
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix definition of workspace limit when dbias is present
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* further tweak DP_WORKSPACE_LIMIT definition
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* disallow alibi+no_mask for sdpa flash and update alibi tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* update jax/paddle after PR525 and fix DP_WORKSPACE_LIMIT for dbias Jax tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* disable dbias for non-hopper archs
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix layernorm lint
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remode unused arg for lint
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove build dir in setup.py
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* change selection logic to prefer fused attn on sm90
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix distributed jax test
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix h and s order in header
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* update to cudnn fe v1 branch
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove manual setting of workopt path due to dbias after v1 update
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix paddle CI
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add post_scale_bias and alibi to sdpa flash support matrix
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix support matrix in header files
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* move headers back to .cu and change seed/offset to int64
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* update Megatron commit in L1 test and remove all prints in fused attn test
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix L1 Megatron test
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix fp8 arg in L1 Megatron script
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* print only when debug flag is on
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove checkpointing loading to avoid loading other tests results
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

---------
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>

32db3928

07 Sep, 2023 1 commit

[PyTorch] Distributed testing (#398) · 4e37499b

Kirthi Shankar Sivamani authored Sep 07, 2023



* Initial setup
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix testfile
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix commit
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Test script
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Add logs
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Add perf summary
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Reviews and improvements
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Generalize GPU count
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* add plots
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Better plot
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* get default file name with time
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

4e37499b