Commits · 91670b0521dbe944401b69d81e741d25e745f9ef · OpenDAS / TransformerEngine

26 Nov, 2025 2 commits

Merge branch 'develop_v2.8' into 'main' · 91670b05

wenjh authored Nov 26, 2025

[DCU] Skip some tests in test_sanity.py

See merge request dcutoolkit/deeplearing/TransformerEngine!61

91670b05

Merge branch 'fix_develop2.8_zc' into 'develop_v2.8' · 3a040217
wenjh authored Nov 26, 2025
```
[DCU]Fix some bugs

See merge request dcutoolkit/deeplearing/TransformerEngine!56
```
3a040217

12 Nov, 2025 4 commits
- Merge branch 'develop_v2.8' into 'main' · e3780e3a
  wenjh authored Nov 12, 2025
```
Fix build error

See merge request dcutoolkit/deeplearing/TransformerEngine!60
```
  e3780e3a
- Fix build error · a622988a
  wenjh authored Nov 12, 2025
  
  a622988a
- Merge branch 'develop_v2.8' into 'main' · a145a62a
  wenjh authored Nov 12, 2025
```
Fix hipblaslt handle manage

See merge request dcutoolkit/deeplearing/TransformerEngine!59
```
  a145a62a
- Fix hipblaslt handle manage · f4bd89eb
  wenjh authored Nov 12, 2025
  
  f4bd89eb
08 Nov, 2025 2 commits
- Merge branch 'develop_v2.8' into 'main' · e32965ff
  wenjh authored Nov 08, 2025
```
Fix user args core dump in mt

See merge request dcutoolkit/deeplearing/TransformerEngine!57
```
  e32965ff
- Fix user args core dump in mt · a13c52ad
  wenjh authored Nov 08, 2025
  
  a13c52ad
03 Nov, 2025 8 commits
- [DCU] fix some bugs in test_numerics.py · f7c66e28
  zhaochao authored Nov 03, 2025
  
  f7c66e28
- [DCU]Skip configurations that FlashAttention does not support · 87682fe2
  zhaochao authored Nov 03, 2025
```
Signed-off-by: zhaochao <zhaochao1@sugon.com>
```
  87682fe2
- [DCU]Resolve the issue of checkpoint test weights not existing. · 9d34e27a
  zhaochao authored Nov 03, 2025
```
Signed-off-by: zhaochao <zhaochao1@sugon.com>
```
  9d34e27a
- [DCU] Fix the bug in test_onnx_export.py under L0 · d5cd815f
  zhaochao authored Nov 03, 2025
```
Signed-off-by: zhaochao <zhaochao1@sugon.com>
```
  d5cd815f
- [DCU] Skip alpha non-1 tests · ef65dd33
  zhaochao authored Nov 03, 2025
```
Signed-off-by: zhaochao <zhaochao1@sugon.com>
```
  ef65dd33
- [DCU] fix bug with cannot import name 'use_lightop_w8a8' from 'transformer_engine.pytorch.utils' · 3d36696b
  zhaochao authored Nov 03, 2025
```
Signed-off-by: zhaochao <zhaochao1@sugon.com>
```
  3d36696b
- [DCU] Skip some tests in test_cuda_graphs.py under L0 · 2fc4b10c
  zhaochao authored Nov 03, 2025
```
Signed-off-by: zhaochao <zhaochao1@sugon.com>
```
  2fc4b10c
- [DCU] Skip some tests in test_sanity.py · 6af7b77d
  zhaochao authored Nov 03, 2025
```
Signed-off-by: zhaochao <zhaochao1@sugon.com>
```
  6af7b77d
31 Oct, 2025 1 commit

Merge branch 'TE_develop2.8' into 'develop_v2.8' · 3a5755b1

wenjh authored Oct 31, 2025

[DCU]Fix memory overflow and test-didistributed in L1_pytorch_istributed_unittest

See merge request dcutoolkit/deeplearing/TransformerEngine!49

3a5755b1

17 Oct, 2025 3 commits
- [DCU]Fix the original code · b11d6fca
  tabuchixiangcai3 authored Oct 17, 2025
```
Signed-off-by: Tangao <2205747538@qq.com>
```
  b11d6fca
- Merge branch 'develop_v2.8' into 'develop_v2.8' · 4b65dfa3
  yuguo authored Oct 17, 2025
```
Update activation offload code to align with the official version

See merge request dcutoolkit/deeplearing/TransformerEngine!52
```
  4b65dfa3
- Update activation offload code to align with the official version · 9711d439
  dongcl authored Oct 17, 2025
  
  9711d439
16 Oct, 2025 3 commits
- Merge branch 'develop_v2.8' of... · 712d526a
  yuguo authored Oct 16, 2025
```
Merge branch 'develop_v2.8' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into develop_v2.8
```
  712d526a
- [DCU] remove redundant gemm · 47077129
  yuguo authored Oct 16, 2025
  
  47077129
- [DCU]Fix memory overflow and test-didistributed in L1_pytorch_istributed_unittest · 2a64c9a6
  tabuchixiangcai3 authored Oct 16, 2025
```
Signed-off-by: Tangao <2205747538@qq.com>
```
  2a64c9a6
15 Oct, 2025 3 commits
- Fix typo · a26a0c30
  wenjh authored Oct 15, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  a26a0c30
- [DCU] fix compile issues · aa62d24c
  yuguo authored Oct 15, 2025
  
  aa62d24c
- [DCU] fix compile issues · 8d5cd8c6
  yuguo authored Oct 15, 2025
  
  8d5cd8c6
13 Oct, 2025 1 commit
- Add cuda data types to hipify · 6cd2b2dd
  wenjh authored Oct 13, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  6cd2b2dd
11 Oct, 2025 1 commit
- Merge branch 'nv_main' · 27ddce40
  wenjh authored Oct 11, 2025
  
  27ddce40
09 Oct, 2025 2 commits
- Merge branch 'main' into 'main' · d262ef4c
  yuguo authored Oct 09, 2025
```
support activation offloading

See merge request dcutoolkit/deeplearing/TransformerEngine!44
```
  d262ef4c
- support activation offloading · 6cfcde78
  dongcl authored Oct 09, 2025
  
  6cfcde78
19 Sep, 2025 2 commits
- Changed VERSION to 2.9.0.dev0 · 5b3092a0
  Przemek Tredak authored Sep 19, 2025
```
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
```
  5b3092a0
- [DCU] fix · b15412aa
  yuguo authored Sep 19, 2025
  
  b15412aa
18 Sep, 2025 7 commits

Fix cuDNN version checks when getting backend and for sm89 kv cache (#2185) · 7f77127c

Kshitij Lakhani authored Sep 18, 2025



* Fix cudnn version checks for kv cache for sm89. Add cudnn version check in preparation for 9.14 when getting backend
Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Minor fix for cuDNN version condition check
Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

7f77127c

Fix w8a8 lightop restriction · 803be71d
wenjh authored Sep 18, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
803be71d
Adapt to changes of hipblaslt · d81f8119
wenjh authored Sep 18, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
d81f8119
Enable lightop w8a8 · 3f800f01
wenjh authored Sep 18, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
3f800f01

[PyTorch] Support FA3 for MLA and with CP (#1907) · c334fc46

zhujian authored Sep 18, 2025



feature(FA3,MLA,CP):
1. Update FA3 to commit-id 3ba6f82 (tag 2.8.0.post2 with compile error fixed), PR-1604 support hdimQK != hdimV backward
2. Update get_attention_backend method because FA3 support MLA now
3. Add CP MLA support for FA3
4. Add unit tests for FA3 MLA CP
5. Update attention doc
Signed-off-by: zhujian <zhujian.whu.cs@gmail.com>

c334fc46

[DCU] fix · 00fcd784
yuguo authored Sep 18, 2025

00fcd784

[Pytorch] Add Cutlass Grouped GEMM Support for fine-grained MoE Model (#2045) · 8aee1bb7

alan yang authored Sep 18, 2025



* feat: add cutlass group gemm support
Signed-off-by: Min Yang <min.yang@shopee.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* refactor: refactor multi tensor gemm interface
Signed-off-by: Min Yang <min.yang@shopee.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* refactor: refactor nvte_multi_stream_cublas_gemm func and add license info
Signed-off-by: Min Yang <min.yang@shopee.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* feat: add unit test for cutlass group gemm
Signed-off-by: Min Yang <min.yang@shopee.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* feat: add cutlass support type protect
Signed-off-by: Min Yang <min.yang@shopee.com>

* add tests and fix lint
Signed-off-by: Xin Yao <xiny@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* feat: fix unit tests error
Signed-off-by: Min Yang <min.yang@shopee.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* feat: refactor host workspace malloc
Signed-off-by: Min Yang <min.yang@shopee.com>

* update cutlass
Signed-off-by: Xin Yao <xiny@nvidia.com>

* update cutlass
Signed-off-by: Xin Yao <xiny@nvidia.com>

* further relex threshold and add a env var to warn fall back
Signed-off-by: Xin Yao <xiny@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Min Yang <min.yang@shopee.com>
Signed-off-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: alan yang <89962857+cassiewilliam@users.noreply.github.com>
Co-authored-by: Min Yang <min.yang@shopee.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Phuong Nguyen <phuonguyen@nvidia.com>

8aee1bb7

17 Sep, 2025 1 commit
- Fix incorrect TP rank calculation when using data parallel (#2179) · eb69fad7
  Daniel Stokes authored Sep 18, 2025
```
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
```
  eb69fad7