Fix swizzle, swap_first_dims and RMSNorm issues on release_v2.7 (Rocky 8.6)
## Summary
Fix swizzle / swap_first_dims RTC build and normalization test issues on `release_v2.7` (ROCm/HIP).
## Background
- ROCm/HIP path currently hits build/runtime/test issues in:
- `swizzle_scaling_factors` (HIP compile constraints with `__device__ __host__` constexpr)
- RTC `swap_first_dims` source selection
- `test_normalization` when `use_cudnn` is enabled for LayerNorm/RMSNorm
- PyTorch L0 unittest environment relying on `PYTHONPATH`
## Changes
1) **qa/L0_pytorch_unittest/test.sh**
- Export `PYTHONPATH` to include `${TE_PATH}` so tests can import from source tree without reinstalling pytest.
- Removed explicit `pip3 install pytest==8.2.1` from the script.
2) **tests/cpp/operator/test_normalization.cu**
- Skip LayerNorm/RMSNorm cases when `use_cudnn` is enabled:
- `GTEST_SKIP(): CudnnLayerNorm and CudnnRmsNorm are disabled.`
- Avoids running unsupported/disabled cuDNN normalization paths in this configuration.
3) **transformer_engine/common/CMakeLists.txt**
- Fix RTC header generation for `swap_first_dims` on ROCm:
- use `transpose/rtc/swap_first_dims.hip` instead of `.cu`.
4) **transformer_engine/common/swizzle/swizzle.cu**
- For `__HIP_PLATFORM_AMD__`, replace `constexpr __device__ __host__ int ...` with plain `constexpr int ...`
- Keeps CUDA path unchanged.
- Addresses HIP compilation constraints while preserving constants’ values and usage.
## Verification
- [x] Build on 10.16.4.9 rocky_8.6 docker Enviroment
- [x] Run `qa/L0_pytorch_unittest/test.sh`
- [x] Run C++ operator tests related to normalization/swizzle as applicable
## Notes
- Branch synced with latest `origin/release_v2.7` before opening this MR.
See merge request dcutoolkit/deeplearing/TransformerEngine!66
Showing
qa/L0_pytorch_unittest/test.sh
100644 → 100755
Please register or sign in to comment