- 23 Jan, 2026 2 commits
-
-
zc20020701 authored
Signed-off-by:
zhaochao <zhaochao1@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!72 Co-authored-by:
zhaochao <zhaochao1@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 21 Jan, 2026 1 commit
-
-
maxiao3 authored
Signed-off-by:maxiao3 <maxiao3@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!71
-
- 12 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 09 Jan, 2026 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wuyf1 authored
## Summary Fix swizzle / swap_first_dims RTC build and normalization test issues on `release_v2.7` (ROCm/HIP). ## Background - ROCm/HIP path currently hits build/runtime/test issues in: - `swizzle_scaling_factors` (HIP compile constraints with `__device__ __host__` constexpr) - RTC `swap_first_dims` source selection - `test_normalization` when `use_cudnn` is enabled for LayerNorm/RMSNorm - PyTorch L0 unittest environment relying on `PYTHONPATH` ## Changes 1) **qa/L0_pytorch_unittest/test.sh** - Export `PYTHONPATH` to include `${TE_PATH}` so tests can import from source tree without reinstalling pytest. - Removed explicit `pip3 install pytest==8.2.1` from the script. 2) **tests/cpp/operator/test_normalization.cu** - Skip LayerNorm/RMSNorm cases when `use_cudnn` is enabled: - `GTEST_SKIP(): CudnnLayerNorm and CudnnRmsNorm are disabled.` - Avoids running unsupported/disabled cuDNN normalization paths in this configuration. 3) **transformer_engine/common/CMakeLists.txt** - Fix RTC header generation for `swap_first_dims` on ROCm: - use `transpose/rtc/swap_first_dims.hip` instead of `.cu`. 4) **transformer_engine/common/swizzle/swizzle.cu** - For `__HIP_PLATFORM_AMD__`, replace `constexpr __device__ __host__ int ...` with plain `constexpr int ...` - Keeps CUDA path unchanged. - Addresses HIP compilation constraints while preserving constants’ values and usage. ## Verification - [x] Build on 10.16.4.9 rocky_8.6 docker Enviroment - [x] Run `qa/L0_pytorch_unittest/test.sh` - [x] Run C++ operator tests related to normalization/swizzle as applicable ## Notes - Branch synced with latest `origin/release_v2.7` before opening this MR. See merge request dcutoolkit/deeplearing/TransformerEngine!66 -
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 07 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 19 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 18 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 15 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 13 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 11 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:
wenjh <wenjh@sugon.com> Mutex group gemm Signed-off-by:
wenjh <wenjh@sugon.com> do while group gemm Signed-off-by:
wenjh <wenjh@sugon.com> Remove mutex Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 26 Nov, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 12 Nov, 2025 5 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
-
wenjh authored
-
wenjh authored
-
- 08 Nov, 2025 1 commit
-
-
wenjh authored
-
- 31 Oct, 2025 2 commits
- 30 Oct, 2025 2 commits
-
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
- 23 Oct, 2025 3 commits
-
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
wenjh authored
-
- 20 Oct, 2025 6 commits
-
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
tabuchixiangcai3 authored
[DCU]Fix MPI root support, enable int8 simulation and batched_inear to access non-existent. main_grad Signed-off-by:Tangao <2205747538@qq.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
- 17 Oct, 2025 2 commits
-
-
wenjh authored
[DCU]Fix memory overflow and test-didistributed in L1_pytorch_istributed_unittest See merge request dcutoolkit/deeplearing/TransformerEngine!48
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
- 16 Oct, 2025 5 commits
-
-
yuguo authored
Release v2.7 See merge request dcutoolkit/deeplearing/TransformerEngine!50
-
dongcl authored
-
dongcl authored
-
yuguo authored
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-