Commits · b998121c828aaa3f954caa4c72e70dbcf71e1272 · OpenDAS / TransformerEngine

09 Jan, 2026 1 commit

Fix swizzle, swap_first_dims and RMSNorm issues on release_v2.7 (Rocky 8.6) · b998121c

wuyf1 authored Jan 09, 2026

## Summary
Fix swizzle / swap_first_dims RTC build and normalization test issues on `release_v2.7` (ROCm/HIP).

## Background
- ROCm/HIP path currently hits build/runtime/test issues in:
  - `swizzle_scaling_factors` (HIP compile constraints with `__device__ __host__` constexpr)
  - RTC `swap_first_dims` source selection
  - `test_normalization` when `use_cudnn` is enabled for LayerNorm/RMSNorm
  - PyTorch L0 unittest environment relying on `PYTHONPATH`

## Changes
1) **qa/L0_pytorch_unittest/test.sh**
   - Export `PYTHONPATH` to include `${TE_PATH}` so tests can import from source tree without reinstalling pytest.
   - Removed explicit `pip3 install pytest==8.2.1` from the script.

2) **tests/cpp/operator/test_normalization.cu**
   - Skip LayerNorm/RMSNorm cases when `use_cudnn` is enabled:
     - `GTEST_SKIP(): CudnnLayerNorm and CudnnRmsNorm are disabled.`
   - Avoids running unsupported/disabled cuDNN normalization paths in this configuration.

3) **transformer_engine/common/CMakeLists.txt**
   - Fix RTC header generation for `swap_first_dims` on ROCm:
     - use `transpose/rtc/swap_first_dims.hip` instead of `.cu`.

4) **transformer_engine/common/swizzle/swizzle.cu**
   - For `__HIP_PLATFORM_AMD__`, replace `constexpr __device__ __host__ int ...` with plain `constexpr int ...`
   - Keeps CUDA path unchanged.
   - Addresses HIP compilation constraints while preserving constants’ values and usage.

## Verification
- [x] Build on 10.16.4.9 rocky_8.6 docker Enviroment
- [x] Run `qa/L0_pytorch_unittest/test.sh`
- [x] Run C++ operator tests related to normalization/swizzle as applicable

## Notes
- Branch synced with latest `origin/release_v2.7` before opening this MR.

See merge request dcutoolkit/deeplearing/TransformerEngine!66

b998121c

18 Jul, 2025 1 commit

[Test] Enable cuDNN Norm tests in the CPP suite (#1957) · 86c50977

Phuong Nguyen authored Jul 18, 2025



* enable cudnn norm tests
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

* exclude tests on pre-Hopper
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

---------
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

86c50977

29 Apr, 2025 1 commit

Kwyss/new shape owns data (#1708) · afb70224

kwyss-nvidia authored Apr 29, 2025

* Reapply "Allow NVTEShape to own data." (#1703)

This reverts commit 91405eb4

.
Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Update code so that data is replaced by an array.
Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Specify unambiguous Tensor constructor in tests.
Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Fix assumption in test of 2D shape.
Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Remove row and col
Signed-off-by: Keith Wyss <kwyss@nvidia.com>

---------
Signed-off-by: Keith Wyss <kwyss@nvidia.com>

afb70224

17 Apr, 2025 1 commit

Support computing zero-centered gamma in compute dtype for CuDNN (#1690) · 61f1bf6f

jberchtold-nvidia authored Apr 17, 2025



* Add a flag to support computing zero-centered gamma in weight dtype or compute dtype for CuDNN
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* Address comments
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

---------
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

61f1bf6f

01 Apr, 2025 1 commit
- [DCU] fix fp8 · fbee8990
  yuguo authored Apr 01, 2025
  
  fbee8990
20 Mar, 2025 1 commit
- [DCU] Preliminary adaptation · c520cba3
  yuguo authored Mar 20, 2025
  
  c520cba3
07 Feb, 2025 1 commit
- Update main branch with TE 2.0 code, update version to 2.1.0.dev0 · 544dd14b
  Przemek Tredak authored Feb 07, 2025
```
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
```
  544dd14b
02 Jan, 2025 1 commit
- Update copyright to include 2025 (#1388) · c9ea6be9
  Kirthi Shankar Sivamani authored Jan 02, 2025
```
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  c9ea6be9
06 Dec, 2024 1 commit

[C] Normalization Refactor + Adding CUDNN backend (#1315) · 3102fdd1

Phuong Nguyen authored Dec 06, 2024



* cuDNN normalization integration
* TE Norm refactor
* TE Norm APIs changes.

---------
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

3102fdd1