Commits · 1036ccfec1307bc35f8aacb1361862b5a259b64d · OpenDAS / TransformerEngine

17 Jul, 2025 2 commits
- Use lightop replace w8a8_mutmal_extension · 1036ccfe
  wenjh authored Jul 17, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  1036ccfe
- [DCU] fix · 00738a42
  yuguo authored Jul 17, 2025
  
  00738a42
16 Jul, 2025 2 commits
- [DCU] surpport NVTE_USE_HIPBLASLT_GROUPEDGEMM · 9406ff31
  yuguo authored Jul 16, 2025
  
  9406ff31
- Fix import error in test_batched_linear · bc2d9697
  wenjh authored Jul 16, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  bc2d9697
15 Jul, 2025 3 commits
- Fix pytorch module import error · 148b5bea
  wenjh authored Jul 15, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  148b5bea
- [DCU] surpport channelwise int8 train · 793e0103
  yuguo authored Jul 15, 2025
  
  793e0103
- Fix import pytorch error · 3939e719
  wenjh authored Jul 15, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  3939e719
11 Jul, 2025 2 commits
- Merge branch 'develop_v2.4' into w8a8_dev_v2.4 · 3b1f30a9
  wenjh authored Jul 11, 2025
  
  3b1f30a9
- Support w8a8_matmul_extension · 6a20ff90
  wenjh authored Jul 11, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  6a20ff90
09 Jul, 2025 2 commits
- [DCU] channelwise batchgemm for MOE · 76023d21
  yuguo authored Jul 09, 2025
  
  76023d21
- Fix int8 gemm nt and wgrad · 5fcf30ba
  wenjh authored Jul 09, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  5fcf30ba
08 Jul, 2025 1 commit
- [DCU] Preliminary support for channelwise · 9fe13a33
  yuguo authored Jul 08, 2025
  
  9fe13a33
03 Jul, 2025 1 commit
- Fix kernel crash on block_len=64 · 40a4d896
  wenjh authored Jul 03, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  40a4d896
01 Jul, 2025 1 commit

[Blockwise] Add support block_len=64 support · b944277c

wenjh authored Jun 25, 2025



Add env to chose blocklen of blockwise quantize.
Signed-off-by: wenjh <wenjh@sugon.com>

Fix pytest of blockwise error
Signed-off-by: wenjh <wenjh@sugon.com>

Resolve new api in  int8 gemm test
Signed-off-by: wenjh <wenjh@sugon.com>

Fix incorrect launch parm
Signed-off-by: wenjh <wenjh@sugon.com>

Fix 1D blockwise(64) acc error
Signed-off-by: wenjh <wenjh@sugon.com>

b944277c

20 Jun, 2025 2 commits
- [DCU] fix megatron MOE int8 train bugs · 251dcc7e
  yuguo authored Jun 20, 2025
  
  251dcc7e
- [DCU] fix megatron MOE int train issues · 7640a8d4
  yuguo authored Jun 20, 2025
  
  7640a8d4
19 Jun, 2025 2 commits
- [DCU] add TORCH_COMM_CU_NUMS and fix · d6c32078
  yuguo authored Jun 19, 2025
  
  d6c32078
- Fix verify acc failed of blockwise quantizer · 8eff19c9
  wenjh authored Jun 19, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  8eff19c9
18 Jun, 2025 2 commits
- Fix vector blockwise acc problem · 8a03ff34
  wenjh authored Jun 18, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  8a03ff34
- Fix lack of lds in vector_blockwise · d1bf39cf
  wenjh authored Jun 18, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  d1bf39cf
16 Jun, 2025 1 commit
- [DCU] fix in8 simul fp8 fused wgrad accumulation · 3653fbfb
  yuguo authored Jun 16, 2025
  
  3653fbfb
13 Jun, 2025 1 commit
- [DCU] fix blockwise int8 train issues in megatron · ecdd8251
  yuguo authored Jun 13, 2025
  
  ecdd8251
12 Jun, 2025 2 commits
- [INT8] Make int8 rounding instead of truncation · 7f946529
  wenjh authored Jun 12, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  7f946529
- [Workaround] Improve acc of vectorise scaling · e2860c76
  wenjh authored Jun 12, 2025
```
Same intention of commit 3e38a2ea

.
This commit is to improve acc.
Signed-off-by: wenjh <wenjh@sugon.com>
```
  e2860c76
11 Jun, 2025 1 commit
- [DCU] add NVTE_TP_OVERLAP_AGGREGATE · b1864da3
  yuguo authored Jun 11, 2025
  
  b1864da3
10 Jun, 2025 1 commit
- [DCU] avoid rtc trans kernel bug (need fix) · fdb21575
  yuguo authored Jun 10, 2025
  
  fdb21575
09 Jun, 2025 4 commits
- Fix build error of test_cublaslt_gemm · 7d2b9c77
  wenjh authored Jun 09, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  7d2b9c77
- [DCU] fix · 6d461a10
  yuguo authored Jun 09, 2025
  
  6d461a10
- [DCU] surpport cast master weight to int8 · 0a8072fa
  yuguo authored Jun 09, 2025
  
  0a8072fa
- [TEST] Fix build error of test_cublaslt_gemm · 2cbe1b70
  wenjh authored Jun 09, 2025
```
Signed-off-by: wenjh <wenjh@sugon.com>
```
  2cbe1b70
06 Jun, 2025 1 commit

[Workaround] Use bf16 lds to save fp32 input · 3e38a2ea

wenjh authored Jun 06, 2025



quantize_transpose_vector_blockwise function use lds exceeding 64kb when
input type is fp32. But max size of lds in dcu is 64kb, thus we use lds
as bfp16 for workaround.
Signed-off-by: wenjh <wenjh@sugon.com>

3e38a2ea

05 Jun, 2025 2 commits
- Merge branch 'develop_v2.3' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine · 32184507
  yuguo authored Jun 05, 2025
  
  32184507
- [DCU] support block fp8 simu with int8 for MOE · b7afba08
  yuguo authored Jun 05, 2025
  
  b7afba08
04 Jun, 2025 4 commits
- Merge branch 'develop_v2.3' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine · 1b303e91
  yuguo authored Jun 04, 2025
  
  1b303e91
- fix · 735227cd
  yuguo authored Jun 04, 2025
  
  735227cd
- Merge branch 'develop_v2.3' of... · 129d7526
  yuguo authored Jun 04, 2025
```
Merge branch 'develop_v2.3' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into develop_v2.3
```
  129d7526
- [DCU] support block fp8 simu with int8 for Dense · f6937668
  yuguo authored Jun 04, 2025
  
  f6937668
28 May, 2025 2 commits
- Merge branch 'develop_v2.3' · 52ba87a1
  wenjh authored May 28, 2025
  
  52ba87a1
- [Workaround] Dtk-25.04.1 need add hip_assert.h for hiprtc · 7e4e1e40
  wenjh authored May 28, 2025
  
  7e4e1e40
27 May, 2025 1 commit
- Merge branch 'develop_v2.3' · 74128807
  wenjh authored May 27, 2025
  
  74128807