Commits · 055f8500171304d25276c801cacdc18fadb4dadd · OpenDAS / tilelang

05 Nov, 2025 1 commit

[Feat] Add swap like grammar in tuple assignment (#1185) · 055f8500

Kurisu authored Nov 05, 2025

* [Feat] add 2 phase binding to allow swap two var

* Minor update tvm dtype constructor

* fix lint error

055f8500

04 Nov, 2025 4 commits

[Refactor] Improve Python3.9 compatibility for ParamSpec and Self (#1190) · 7d961892

Lei Wang authored Nov 04, 2025

* [Feature] Enhance fill operation to support various buffer types

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.

* lint fix

* [Refactor] Improve Python compatibility for ParamSpec and Self

- Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively.
- Updated type annotations across multiple files to ensure consistent usage of typing features.

* [Update] Require Python 3.9 and enhance type annotations

- Updated the minimum required Python version from 3.8 to 3.9 in `pyproject.toml`.
- Removed references to Python 3.8 in classifiers.
- Changed type annotations from `int | None` to `Optional[int]` in multiple example files for better clarity and compatibility.
- Improved import statements to use `collections.abc` for `Iterable` and `contextlib` for `AbstractContextManager` in relevant files.

* [Refactor] Update import statements to enhance type annotations

- Replaced imports from `typing` with `collections.abc` for `Iterable` and `Mapping` in relevant files to improve compatibility and clarity.
- Updated the caching decorator from `functools.lru_cache` to `functools.cache` for better performance in the C++ compiler retrieval function.
- Adjusted import statements in the language proxy file to maintain consistency in type annotations.

* disable rocm rs nt test.

* lint fix

7d961892

[Feature] Enhance fill operation to support various buffer types (#1189) · a03df604

Lei Wang authored Nov 04, 2025

* [Feature] Enhance fill operation to support various buffer types

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.

* lint fix

a03df604

[Fix] Remove unsupported type params (#1186) · 1768cbef
Kurisu authored Nov 04, 2025
```
* [Fix] Remove type params

* fix lint error

* [Fix] fix dtype new error
```
1768cbef

[CI] [pre-commit.ci] autoupdate (#1183) · 778b97dc

pre-commit-ci[bot] authored Nov 04, 2025

* [CI] [pre-commit.ci] autoupdate

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.1 → v0.14.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.1...v0.14.3

)

* [CI] sync ruff version

---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

778b97dc

03 Nov, 2025 5 commits

[Fix] fix type imcompatible error in #1115 (#1180) · 4ef94f22
Kurisu authored Nov 04, 2025
```
* Fix incompatible floordiv in packed api

* fix lint
```
4ef94f22

[Language] Initial version of tilelang frontend v2 (#1120) · 5f202fe5

Kurisu authored Nov 03, 2025



* tilelang frontend v2

* syntax sugar: defining a local var by annotation

* [Refactor] fix type linting warning like `T.float32`

* Add tl.local_var_init for new tl.float32

* allow passing default argument as function annotation

* allow default arguments as annotation

* fix lint error

* minor fix

* [Refactor] refactor tilelang.jit and tilelang.autotune

* minor fix

* minor fix

* minor fix

* fix metal get function name

* add par_compile impl and tests

* Type consistency on tvm datatype
1. isinstance(tl.float32, tvm.DataType) == True
2. Allow `tl.float32` as function annotations
3. Allow `tl.float32` as argument to be passed to `tl.alloc` or other functions

* fix lint error

* add more warning in frontend

* update tvm version

* Minor fix on tvm_ffi annotations

* add document and examples

* fix lint error

* Simplify index calculations in example_chunk_o_bwd.py

Refactor index calculations for dg_last_fragment assignment.

* minor fix

* lint fix

---------
Co-authored-by: Lei Wang <leiwang1999@outlook.com>
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

5f202fe5

[CI]: Bump actions/upload-artifact from 4 to 5 (#1178) · ba390756

dependabot[bot] authored Nov 03, 2025

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v5

)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

ba390756

[CI]: Bump actions/download-artifact from 5 to 6 (#1177) · 7de095e5

dependabot[bot] authored Nov 03, 2025

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 5 to 6.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v5...v6

)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

7de095e5

[Bugfix] Legalize Datatype for mma intrinisc codegen (#1179) · 7c61d31a

Lei Wang authored Nov 03, 2025

* fix

* lint fix

* Enhance CUDA code generation by updating register type handling for float data types. Introduced a workaround for TF32 type compatibility and improved the registration of MMA register types for A and B operands.

7c61d31a

02 Nov, 2025 4 commits

[Language] Add Correctness and performance check scripts for V2 (#1174) · d99853b6
Lei Wang authored Nov 03, 2025
```
* fix

* lint fix

* fix

* lint fix

* fix

* upd
```
d99853b6

[Language] Expose `T.warpgroup_fence_operand` for nvcc code motion (#986) · aef0a6bb

Lei Wang authored Nov 03, 2025



* remove debug print

* pipeline fix

* use the correct buffer access scope

* rs support

* warp warpgroup_fence_operand

* fix

* fp8 dtype ptx enhance

* mma fix

* TCGEN05 Interface

* tcgen05 support

* rebase

* update

* Enhance TCGEN05 support by adding new intrinsic operations and descriptors. Introduced `ptx_tcgen05_mma_ts` for tensor-memory to shared-memory instructions and `tcgen05_mma_arrive` for signaling barrier completion. Updated existing descriptors and code generation logic to accommodate these changes, ensuring compatibility with new instruction sets. Refactored related allocation functions and improved handling of shared memory descriptors.

* lint fix

* Refactor buffer reference handling in CUDA code generation and update test execution in tilelang. Ensure default annotations for unrolling are set correctly in TIR IR module.

* wgmma fix

---------
Co-authored-by: Zhiwen Mo <zm125@ic.ac.uk>

aef0a6bb

[Bugfix] Fix tvm import path for editable build (#1172) · c85bb3ac
Lei Wang authored Nov 02, 2025

c85bb3ac

[Refactor]: Change the params in pytest to avoid oom error during ci (#1170) · 13bdcd60

Yuqi Dong authored Nov 02, 2025

* [Refactor]: Change the params in pytest to avoid oom error during ci

* format

* fix

* Update test_example_cast.py

* Update parameters in test_example_cast

* Update test_example_flash_attention.py

* update

* format

* fix

* fix

* format

13bdcd60

01 Nov, 2025 1 commit
- [Testing] Move TMA 1D and test for its functionality (#1167) · 5c62d00a
  Zhengju Tang authored Nov 01, 2025
```
* [Testing] Move TMA 1D and test for its functionality

* [Lint]
```
  5c62d00a
31 Oct, 2025 4 commits

[Bugfix] Support 16bits shfl_sync (#1169) · 54d4bd62

Lei Wang authored Oct 31, 2025

* Add type-safe warp shuffle helpers for 16-bit float types in common.h

- Introduced generic passthrough functions for warp shuffle operations: `shfl_xor_sync`, `shfl_down_sync`, `shfl_up_sync`, and `shfl_sync`.
- Added specializations for `cutlass::half_t` and `cutlass::bfloat16_t` to ensure type safety during shuffle operations.
- Updated `reduce.h` to utilize the new shuffle functions, enhancing code clarity and maintainability.

* lint fix

54d4bd62

[Bugfix] Enable code lowering with producer‑copy‑only program (#1168) · 7a80b6df

Lei Wang authored Oct 31, 2025

* bugfix

* lint fix

* Enhance warp group register allocation to handle missing consumer bodies gracefully. Updated logic to annotate producer side when consumer is absent, ensuring robustness in degenerate warp-specialized patterns.

* Refactor VisitExpr_ method in inject_tma_barrier.cc for improved readability. Adjusted formatting and spacing for clarity in barrier handling logic.

* Update barrier handling in inject_tma_barrier.cc to accommodate newly appended entries. Adjusted the size of the replace vector to ensure it covers the full needed length, and modified the logic for appending barriers based on the updated replace conditions.

7a80b6df

[FFI] Rebase tvm to v0.22.0 to utilize tvm-ffi (#1108) · 10911e28

Lei Wang authored Oct 31, 2025



* 3rdparty tvm bump

* bump tvm into v0.22.0

* lint fix

* rebase tvm

* Update submodule tvm to latest commit 3085bc4

* Refactor: Update configuration retrieval in CopyNode and adjust test registration in tilelang

* test fix

* add requirement

* atomic_fix

* atomic_fix

* phaseout py39

* optimize

* optimize

* lint fix

* do not clean cache

* do not clean cache

* [Minor] Minor update for Python versions and dependencies

* [Lint] fix lint for py39

* [Lint] fix lint for ROCm

* [Build][CI] Sync CI changes from upstream/sdist

* [Lint] fix lint for ROCm

* [Build][CI] Update `repair-wheel-command`

* [Minor] update abi3audit result format

* [Lint] fix lint for ROCm

* [BugFix] fix build

* [Lint] fix lint for ROCm

* [BugFix] set rpath for libtvm and libtvm_runtime

* [Deps] pin apache-tvm-ffi version

* [Build] set Python 3.9 Limited API for Cython target

* [Build] set Python 3.9 Limited API for Cython target

* [Deps] Restore Python 3.8 support

* [Build] use `apache-tvm-ffi`'s `libtvm_ffi`

* [BugFix] use `;` as delimiter for RPATH on macOS

* [BugFix] use `--ignore-missing-dependencies` for `delocate-wheel`

* [Build] support `sccache` if available

* [Build] add CIBW import test

* [Build][CI] enable ccache for CIBW on Linux

* [BugFix] set rpath for libtvm and libtvm_runtime

* Revert "[Build][CI] enable ccache for CIBW on Linux"

This reverts commit cd9ab57bb5ddd2572c60bcbbebde81480a658fd3.

* [CI] fix perfbench bot

* [BugFix] use Python 3.9 to build wheel

* [Minor] update perfbench bot envs

* [BugFix] fix CIBW environment on Linux

* [CI] skip import test on CentOS 7

* [CI] use Python urllib to download file instead of Wget

---------
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

10911e28

[Release] Bump version to v0.1.6.post2 (#1160) · c37621c5

Lei Wang authored Oct 31, 2025

* [Release] Update README and VERSION for v0.1.6.post2 compatibility with Python 3.8

* [Enhancement] Update packaging configuration and Docker scripts for multi-architecture support

* Add allowlist for TVM, CUTLASS, and Composable Kernel items in pyproject.toml
* Enhance docker_local_distribute.sh to support cross-architecture builds using docker buildx
* Modify pypi.manylinux.Dockerfile to accept TARGETARCH argument for better architecture handling

* [Enhancement] Improve Docker scripts and build process for multi-architecture support

* Update .gitignore to include dist directories
* Refactor docker_local_distribute.sh for better cross-architecture handling and error management
* Enhance docker_pypi_distribute.sh to support multi-architecture builds with docker buildx
* Modify pypi_distribution.sh to clean up additional directories
* Update pypi.manylinux.Dockerfile for improved environment configuration and architecture handling

* fix

* Remove outdated classifier for Artificial Intelligence from pyproject.toml

* Update pyproject.toml classifiers and modify Docker distribution scripts for clarity

* Add new classifier for Artificial Intelligence in pyproject.toml
* Rename output directories in docker_local_distribute.sh and docker_pypi_distribute.sh for better context

c37621c5

29 Oct, 2025 6 commits

[Bugfix] Enhance LetStmt handling in Vectorize Loop Pass (#1159) · 79730b11

Lei Wang authored Oct 30, 2025

* [Refactor] Enhance TLVectorizer with loop vectorization convenience method and improve let variable handling

* lint fix

* let test fix

* lint fix

79730b11

[Enhancement] Enhance Cast operations Vectorization (#1156) · feef9ef6
LJC00118 authored Oct 29, 2025
```
* Enhance Cast vectorized

* Add Parallel vectorized cast test

* code lint

* merge newest commit
```
feef9ef6
[Refactor]:Move device_assert from extern_call to intrin_call (#1134) · 198f22b3
Yuqi Dong authored Oct 29, 2025
```
* update

* Update codegen_cuda.cc
```
198f22b3

[BugFix] Correct direct copy from bf16 to fp8 (#1090) · e1b12bd0

Cunxiao Ni authored Oct 29, 2025



* [BugFix] Correct direct copy from bf16 to fp8

* fix lint

* implement overloaded cast codegen for type conversion

* fix lint

* remove test

* fix lint

* trigger CI

* Overload fp8 for implicit conversion

* format

* new format

* fix: Reinterpret types to cute types in GEMM

* new format

* fix lint

* new format

* fix lint

* format

* trigger ci

---------
Co-authored-by: nicunxiao <nicunxiao@bytedance.com>

e1b12bd0

[CI] use Python urllib to download file instead of Wget (#1154) · d9a0f131
Xuehai Pan authored Oct 29, 2025

d9a0f131
[CI] allow dirty workspace for `format.sh` and introduce loop carry thread sync unit test (#1153) · 4efd2d2d
Lei Wang authored Oct 29, 2025
```
* atomic_fix

* atomic_fix

* mem fix

* lint fix

* add some comments

* fix

* fix

* lint fix

* handle async copy

* lint fix

* lint fix
```
4efd2d2d

28 Oct, 2025 5 commits
- [Bugfix] Implement classic arena algorithm for shmem merge and WAW conflict detection (#1146) · f7ba45d8
  Lei Wang authored Oct 29, 2025
```
* atomic_fix

* atomic_fix

* mem fix

* lint fix

* add some comments

* fix

* fix

* lint fix

* handle async copy

* lint fix
```
  f7ba45d8
- [BugFix] Implement bfloat16 support in CUDA code generation with min/max... · c70b2697
  Tong WU authored Oct 29, 2025
```
[BugFix] Implement bfloat16 support in CUDA code generation with min/max functions and inf/nan values (#1143)

* Implement bfloat16 support in CUDA code generation with min/max functions and inf/nan values

* refactor

* fix prev typo

* bugfix

* lint

* bugfix
```
  c70b2697
- [Refactor] Remove amd gemm_v2 tests (#1149) · bc773c56
  Lei Wang authored Oct 29, 2025
  
  bc773c56
- [BugFix] alloc_var init failed to handle complex expression (#1144) · 399af087
  Kurisu authored Oct 28, 2025
```
* [Fix] init var with complex expression

* fix lint error
```
  399af087
- [AMD] Supoort T.gemm_v2 for AMD Backend (#1136) · 60567ba3
  Jiaxing Ding authored Oct 28, 2025
  
  60567ba3
27 Oct, 2025 9 commits

[Bugfix] Correctly construct the argument list for atomic add based on the vector size (#1137) · 7d389a43
Lei Wang authored Oct 28, 2025
```
* atomic_fix

* atomic_fix
```
7d389a43
[BugFix] Add memory order and testing script for split version GQA bwd kernel (#1100) · 853f9c3d
Zhengju Tang authored Oct 28, 2025
```
* [BugFix] Add memory order for split version kernel; Remove torch manual seed

* [Lint] Manual
```
853f9c3d

Add int2 and longlong4 pack functions (#1129) · 4c9da81a

LJC00118 authored Oct 27, 2025

* Remove an incorrect check

* add fp8 pack function

* code lint

* minor fix

* minor fix

* minor fix

* Minor fix

* Minor fix

* add pack function

* code lint

* code lint

4c9da81a

[Benchmark] Update triton and helion baselines in mamba-chuk-scan (#1131) · 95e7bc37
Yu Cheng authored Oct 27, 2025
```
* [Benchmark] Update triton and helion baselines in mamba-chuk-scan

* lint

* update mamba baseline version
```
95e7bc37
[Build][CI] Build and test SDist in release CI (#1098) · 6e1dc6a1
Xuehai Pan authored Oct 27, 2025

6e1dc6a1
[Feature]:Add device assert (#1116) · 5475f8e7
Yuqi Dong authored Oct 27, 2025
```
* update

* update
```
5475f8e7
[Enhancement] Add missing `fence_barrier_init` primitive after mbarrier init (#1121) · 17a63976
Yu Cheng authored Oct 27, 2025
```
* [Enhancement] Add missing  primitive after mbarrier init

* lint
```
17a63976

[CI]: Bump actions/download-artifact from 5 to 6 (#1127) · 0dc50a54

dependabot[bot] authored Oct 27, 2025

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 5 to 6.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v5...v6

)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

0dc50a54

[CI]: Bump actions/upload-artifact from 4 to 5 (#1128) · 69113a6d

dependabot[bot] authored Oct 27, 2025

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v5

)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

69113a6d

25 Oct, 2025 1 commit

[Feature] Add memory_order PTX for vectorized atomic add (#1112) · 59865bdf

Zhengju Tang authored Oct 25, 2025



* [Feature] Add memory_order PTX for vectorized (2x) atomic add

* [Feature] Add memory_order PTX for all vectorized atomic add

* [Lint]

* test

* [BugFix] FIx init optional argument in alloc_var

* bug fix

* bug fix

* lint fix

* lint fix

---------
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

59865bdf