- 09 Jan, 2026 1 commit
-
-
wuyf1 authored
## Summary Fix swizzle / swap_first_dims RTC build and normalization test issues on `release_v2.7` (ROCm/HIP). ## Background - ROCm/HIP path currently hits build/runtime/test issues in: - `swizzle_scaling_factors` (HIP compile constraints with `__device__ __host__` constexpr) - RTC `swap_first_dims` source selection - `test_normalization` when `use_cudnn` is enabled for LayerNorm/RMSNorm - PyTorch L0 unittest environment relying on `PYTHONPATH` ## Changes 1) **qa/L0_pytorch_unittest/test.sh** - Export `PYTHONPATH` to include `${TE_PATH}` so tests can import from source tree without reinstalling pytest. - Removed explicit `pip3 install pytest==8.2.1` from the script. 2) **tests/cpp/operator/test_normalization.cu** - Skip LayerNorm/RMSNorm cases when `use_cudnn` is enabled: - `GTEST_SKIP(): CudnnLayerNorm and CudnnRmsNorm are disabled.` - Avoids running unsupported/disabled cuDNN normalization paths in this configuration. 3) **transformer_engine/common/CMakeLists.txt** - Fix RTC header generation for `swap_first_dims` on ROCm: - use `transpose/rtc/swap_first_dims.hip` instead of `.cu`. 4) **transformer_engine/common/swizzle/swizzle.cu** - For `__HIP_PLATFORM_AMD__`, replace `constexpr __device__ __host__ int ...` with plain `constexpr int ...` - Keeps CUDA path unchanged. - Addresses HIP compilation constraints while preserving constants’ values and usage. ## Verification - [x] Build on 10.16.4.9 rocky_8.6 docker Enviroment - [x] Run `qa/L0_pytorch_unittest/test.sh` - [x] Run C++ operator tests related to normalization/swizzle as applicable ## Notes - Branch synced with latest `origin/release_v2.7` before opening this MR. See merge request dcutoolkit/deeplearing/TransformerEngine!66
-
- 14 Aug, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Add launch bounds to swizzle kernel, use empty scale inv Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Aug, 2025 1 commit
-
-
Xin Yao authored
* for loop Signed-off-by:
Xin Yao <xiny@nvidia.com> * bulk alloc Signed-off-by:
Xin Yao <xiny@nvidia.com> * multi-tensor swizzle Signed-off-by:
Xin Yao <xiny@nvidia.com> * pad zeros in swizzle kernels Signed-off-by:
Xin Yao <xiny@nvidia.com> * unify single- and multi-tensor swizzle Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix empty tensor list Signed-off-by:
Xin Yao <xiny@nvidia.com> * fix bug for col swizzle Signed-off-by:
Xin Yao <xiny@nvidia.com> * check context & fix signifiers Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 29 May, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Changed the Tensor allocation strategy Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Disable debug flag Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the double free error Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixed pyTorch recipe extension Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Hide TensorAllocator and fix the usage in LayerNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleaning Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix permutation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 27 Mar, 2025 1 commit
-
-
yuguo authored
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-