1. 09 Jan, 2026 2 commits
    • wuyf1's avatar
      Fix swizzle, swap_first_dims and RMSNorm issues on release_v2.7 (Rocky 8.6) · b998121c
      wuyf1 authored
      ## Summary
      Fix swizzle / swap_first_dims RTC build and normalization test issues on `release_v2.7` (ROCm/HIP).
      
      ## Background
      - ROCm/HIP path currently hits build/runtime/test issues in:
        - `swizzle_scaling_factors` (HIP compile constraints with `__device__ __host__` constexpr)
        - RTC `swap_first_dims` source selection
        - `test_normalization` when `use_cudnn` is enabled for LayerNorm/RMSNorm
        - PyTorch L0 unittest environment relying on `PYTHONPATH`
      
      ## Changes
      1) **qa/L0_pytorch_unittest/test.sh**
         - Export `PYTHONPATH` to include `${TE_PATH}` so tests can import from source tree without reinstalling pytest.
         - Removed explicit `pip3 install pytest==8.2.1` from the script.
      
      2) **tests/cpp/operator/test_normalization.cu**
         - Skip LayerNorm/RMSNorm cases when `use_cudnn` is enabled:
           - `GTEST_SKIP(): CudnnLayerNorm and CudnnRmsNorm are disabled.`
         - Avoids running unsupported/disabled cuDNN normalization paths in this configuration.
      
      3) **transformer_engine/common/CMakeLists.txt**
         - Fix RTC header generation for `swap_first_dims` on ROCm:
           - use `transpose/rtc/swap_first_dims.hip` instead of `.cu`.
      
      4) **transformer_engine/common/swizzle/swizzle.cu**
         - For `__HIP_PLATFORM_AMD__`, replace `constexpr __device__ __host__ int ...` with plain `constexpr int ...`
         - Keeps CUDA path unchanged.
         - Addresses HIP compilation constraints while preserving constants’ values and usage.
      
      ## Verification
      - [x] Build on 10.16.4.9 rocky_8.6 docker Enviroment
      - [x] Run `qa/L0_pytorch_unittest/test.sh`
      - [x] Run C++ operator tests related to normalization/swizzle as applicable
      
      ## Notes
      - Branch synced with latest `origin/release_v2.7` before opening this MR.
      
      See merge request dcutoolkit/deeplearing/TransformerEngine!66
      b998121c
    • wenjh's avatar
      Fix tests of L0 test_numeric and L1 test_fusible_ops · abe1fdf5
      wenjh authored
      
      Signed-off-by: wenjh's avatarwenjh <wenjh@sugon.com>
      abe1fdf5
  2. 30 Oct, 2025 1 commit
  3. 03 Sep, 2025 1 commit
  4. 27 Aug, 2025 1 commit
  5. 25 Aug, 2025 1 commit
  6. 15 Aug, 2025 1 commit
  7. 08 Aug, 2025 1 commit
  8. 29 Jul, 2025 1 commit
  9. 21 Jul, 2025 1 commit
  10. 16 Jul, 2025 1 commit
  11. 15 Jul, 2025 1 commit
  12. 10 Jul, 2025 1 commit
  13. 09 Jul, 2025 1 commit
  14. 04 Jun, 2025 2 commits
  15. 20 May, 2025 1 commit
  16. 14 May, 2025 1 commit
  17. 08 May, 2025 1 commit
  18. 17 Apr, 2025 1 commit
  19. 15 Apr, 2025 1 commit
  20. 07 Apr, 2025 1 commit
  21. 04 Apr, 2025 1 commit
  22. 25 Mar, 2025 1 commit
  23. 22 Mar, 2025 2 commits
  24. 18 Mar, 2025 1 commit
  25. 17 Mar, 2025 1 commit
  26. 13 Mar, 2025 1 commit
  27. 09 Mar, 2025 1 commit
  28. 05 Mar, 2025 1 commit
  29. 04 Mar, 2025 1 commit
  30. 28 Feb, 2025 1 commit
  31. 26 Feb, 2025 1 commit
  32. 24 Feb, 2025 1 commit
  33. 07 Feb, 2025 1 commit
  34. 02 Jan, 2025 1 commit
  35. 20 Dec, 2024 1 commit
  36. 18 Oct, 2024 1 commit
  37. 16 Oct, 2024 1 commit