Commits · 47039f06979f2455e5e73f8807791d4e6a1c027f · OpenDAS / tilelang

10 Nov, 2025 6 commits

[Language] Refactor reduce and support shared memory as its in/out (#1219) · 47039f06

Lei Wang authored Nov 10, 2025

* [Refactor] Update ReduceOpNode to use absolute values in Max computation and remove unused shared memory reduction logic

* Changed Max computation for AbsMax type to use absolute values of lhs and rhs.
* Removed unused shared memory reduction logic and related checks for buffer dimensions and thread extents, simplifying the Lower method.
* Added a fatal log for unsupported buffer scope reductions.

* reduce fix

* [Fix] Update type check for eval value in Builder class

* Changed the type check for eval values to raise a TypeError for unsupported types, specifically excluding instances of tvm.tir.Buffer. This improves error handling and clarity in the Builder class.

47039f06

[Enhancement] Improve iterator handling in layout utilities and parallel operations (#1221) · 2957afca

Lei Wang authored Nov 10, 2025

* [Enhancement] Improve iterator handling in layout utilities and parallel operations

* Added a new function, DivideUnusedIterators, to detect per-iterator gaps in fused index expressions, enhancing the accuracy of unused iterator detection.
* Updated CompleteBufferFragment to prefer direct inversion for bijective index mappings and introduced a fallback mechanism for non-bijective cases, improving layout inversion robustness.
* Added a new test for layout inference in fused kernels to ensure correct compilation and execution without layout inversion failures.

* lint fix

2957afca

[Bugfix] Improve error handling in LayoutNode::InverseWithLevel (#1215) (#1220) · cf46b7bd

Lei Wang authored Nov 10, 2025

* Added logging and exception handling for layout errors in InverseWithLevel method.
* Replaced direct error check with a throw statement to enhance error reporting and debugging capabilities.

cf46b7bd

[Utils] Add source export, NVCC-based PTX/SASS dump, logging (#1216) · 7e5b1cd2

Lei Wang authored Nov 10, 2025

* [Enhancement] Add NVCC support for PTX and SASS generation in TileLang

* Introduced functions to compile CUDA C++ source to PTX and SASS formats, enhancing the ability to generate intermediate representations for CUDA kernels.
* Added default compile options for NVCC, including paths for TileLang templates, CUTLASS, and CUDA includes.
* Implemented methods to export and display generated PTX and SASS code, improving usability for developers working with CUDA targets.
* Updated JITKernel class to integrate new NVCC functionalities for PTX and SASS handling, ensuring compatibility with existing workflows.

* [Fix] Improve error handling in get_sass_from_source function

* Added contextlib to suppress exceptions when removing temporary files, enhancing robustness.
* Fixed formatting of error message for clarity when CUDA tools are not found, ensuring better user feedback.

* [Enhancement] Preserve user flags in NVCC compile options

* Updated the default_compile_options function to preserve user-specified compile flags, including repeated tokens, by utilizing shlex for proper tokenization.
* This enhancement improves the flexibility and accuracy of NVCC compile options, ensuring that all user inputs are correctly handled.

7e5b1cd2

[Build] Explicitly add `libtvm` as a dep of `libtilelang` (#1215) · 2bc45bc3
Yichen Yan authored Nov 10, 2025

2bc45bc3
[Fix] Fix buffer re-import typo in tilelang.languge (#1214) · d5fda276
Kuris authored Nov 10, 2025
```
* Fix Buffer re-import typo in tilelang.langugage

* fix lint error
```
d5fda276

09 Nov, 2025 1 commit

[Bugfix] Enhane LetStmt Handling in Pipeline Transform (#1212) · 85218bd9

Lei Wang authored Nov 09, 2025

* [Enhancement] Introduce LetWrapper for handling loop variable substitutions in pipeline rewriting

* Added LetWrapper struct to encapsulate variable and value pairs for loop variable substitutions.
* Updated PipelineRewriter to accept a vector of LetWrapper instances, allowing for proper handling of Let statements that depend on the pipeline loop variable.
* Enhanced the BuildPipeline method to incorporate LetWrapper instances into rewritten blocks, ensuring correct substitutions during pipeline execution.
* Refactored logic for processing Let statements to differentiate between those that use the loop variable and those that do not, improving the flexibility of the pipeline transformation.

* Refactor lambda expression for clarity in loop variable usage check in inject_pipeline.cc

* [Test] Add regression test for loop variable handling in kernel compilation

* Introduced a new test case to verify correct handling of loop variables in the kernel compilation process, addressing a regression issue with InjectSoftwarePipeline.
* The test ensures that the loop variable is not left as a free variable, which previously caused failures in MakePackedAPI.
* Configurations are set to disable warp specialization and TMA lowering to align with the original issue reproduction.

* Remove unused import in regression test for loop variable handling in kernel compilation

85218bd9

08 Nov, 2025 2 commits

[Enhancement] Improve handling of negative indices for ramp and broadcast node (#1207) · 918a21bd

Lei Wang authored Nov 09, 2025

* [Enhancement] Improve handling of negative indices in legalize_negative_index pass

* Added logic to handle scalar and vector indices separately, enhancing the ability to determine non-negativity and negativity of indices.
* Introduced detailed logging for cases where non-negativity cannot be proven, improving debugging capabilities.
* Refactored index state determination for vector types, including support for Ramp and Broadcast nodes.

* Fix incorrect lane handling in legalize_negative_index pass by dereferencing lanes to obtain the correct integer value.

* Enhance legalize_negative_index pass by including necessary header for TIR operations. This addition supports improved functionality and maintainability of the transformation logic.

918a21bd

Fix Dockerfile.cu128 (#1208) · 4818d209
Jesse authored Nov 08, 2025

4818d209

07 Nov, 2025 3 commits

[Bugfix][Language V2] Capture closure variables from program (#1206) · 8119550b

Lei Wang authored Nov 07, 2025



* Enhance CUDA code generation by improving register type handling for float data types and introducing a workaround for TF32 compatibility. Updated MMA register type registration for A and B operands to boost performance and ensure correctness.

* lint fix

---------
Co-authored-by: Zhiwen Mo <zm125@ic.ac.uk>

8119550b

[Bugfix] Improves the accuracy of dependency analysis in the storage access (#1205) · c8ec3469

Lei Wang authored Nov 07, 2025

* Refactor storage access visitor in TileLang to improve readability and maintainability. Organized includes, enhanced comments, and preserved access summaries during condition evaluations in IfThenElse statements. Adjusted handling of buffer accesses and thread invariance checks for better clarity.

* lint fix

c8ec3469

fix data type (#1204) · 556e87bf
LJC00118 authored Nov 07, 2025

556e87bf

06 Nov, 2025 3 commits

[Feat] Add A Pass to Handle Negative Index (#1192) · 0592834f
Kurisu authored Nov 06, 2025

0592834f

[Feat] Add support for `T.serial` with step and negative step (#1188) · 777881e1

Kurisu authored Nov 06, 2025



* [Feature] Support serial for with step

* add more tests

* fix

* Enhance trip count validation in SerialForWithStep to ensure non-zero step values and prevent undefined behavior. Added error handling for zero step values and improved logging for non-constant steps.

* Update builder.py

* fix lint error

---------
Co-authored-by: Zhiwen Mo <zm125@ic.ac.uk>
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

777881e1

[CI] Enable `ccache` for CIBW on Linux (#1184) · a59d41d6

Yichen Yan authored Nov 06, 2025

* Enable ccache for linux cibw, unify ccache settings.

* hash cc files to avoid get stuck in some case

* Add comments about ccache version

* fix wrong gitignore

a59d41d6

05 Nov, 2025 8 commits

[SM70] Refactor and minor fix for SM70 (#1195) · 4a9cb470

Lei Wang authored Nov 06, 2025

* [Feature] Add support for SM70 tensor core MMA instructions

- Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation.
- Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture.
- Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic.
- Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency.
- Added correctness evaluation script for testing GEMM operations on SM70.

* [Refactor] Update formatting and installation commands in scripts

- Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations.
- Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters.
- Cleaned up spacing and formatting in various C++ source files for better consistency and readability.
- Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity.
- Ensured consistent formatting in layout initialization and swizzle functions.

* typo fix

4a9cb470

[Feature] Add `tl.infinity` operator for infinity handling of bfloat16 (#1175) · 11456de2

Tong WU authored Nov 06, 2025



* Update dependency version for apache-tvm-ffi in pyproject.toml to fix CI

* [Math] Add `tl.infinity` operation and update Python interface for infinity handling

- Implemented `infinity_op` in C++ to return infinity values for supported data types.
- Registered new operation `tl.infinity` with appropriate attributes.
- Updated Python interface to call the new `tl.infinity` operation instead of the previous method.

* Add unit tests for `tl.infinity` operation in TileLang

- Introduced a new test file `test_tilelang_language_infinity.py` to validate the behavior of the `tl.infinity` operation across multiple data types (float16, bfloat16, float32, float64).
- Implemented a kernel to fill a tensor with infinity values and asserted the correctness of the output against PyTorch's `torch.inf`.

* lint

---------
Co-authored-by: Zhiwen Mo <zm125@ic.ac.uk>

11456de2

[Refactor] Dynamic registration of FP8 data type for compatibility with older... · c67d66a3
Lei Wang authored Nov 05, 2025
```
[Refactor] Dynamic registration of FP8 data type for compatibility with older PyTorch versions (#1197)
```
c67d66a3
[Example] Update GQA varlen fwd (#1173) · a9d823b8
Yu Cheng authored Nov 05, 2025
```
* [Example] Update GQA varlen fwd

* fix
```
a9d823b8

[GQA] Use TMA in GQA bwd kernel to boost performance (#1176) · 298ab480

Zhengju Tang authored Nov 05, 2025



* [Test] Add cp async to avoid register spill

* [BugFix] GQA fwd and bwd
- Fix the undefined behavior of -inf in acc_s
- Fix the causal loop range in varlen scenario

* [TMA] Move on to TMA and locate the register spill issue

* [Debug] Not the reason of zero-assignment. Probably the combination of Parallel op & conditional qkT

* [Debug] The SIMT copy in producer occupies too many registers

* [BugFix] Use 3D lse and delta to avoid illegal instruction

* [Perf] Relaxed order for dQ and SIMT store for dKdV

* [Feat] For atomic add version

* [Lint]

* [Bugfix] Enable code lowering with producer‑copy‑only program (#1168)

* bugfix

* lint fix

* Enhance warp group register allocation to handle missing consumer bodies gracefully. Updated logic to annotate producer side when consumer is absent, ensuring robustness in degenerate warp-specialized patterns.

* Refactor VisitExpr_ method in inject_tma_barrier.cc for improved readability. Adjusted formatting and spacing for clarity in barrier handling logic.

* Update barrier handling in inject_tma_barrier.cc to accommodate newly appended entries. Adjusted the size of the replace vector to ensure it covers the full needed length, and modified the logic for appending barriers based on the updated replace conditions.

* [Bugfix] Support 16bits shfl_sync (#1169)

* Add type-safe warp shuffle helpers for 16-bit float types in common.h

- Introduced generic passthrough functions for warp shuffle operations: `shfl_xor_sync`, `shfl_down_sync`, `shfl_up_sync`, and `shfl_sync`.
- Added specializations for `cutlass::half_t` and `cutlass::bfloat16_t` to ensure type safety during shuffle operations.
- Updated `reduce.h` to utilize the new shuffle functions, enhancing code clarity and maintainability.

* lint fix

* [Testing] Move TMA 1D and test for its functionality (#1167)

* [Testing] Move TMA 1D and test for its functionality

* [Lint]

* [Refactor]: Change the params in pytest to avoid oom error during ci (#1170)

* [Refactor]: Change the params in pytest to avoid oom error during ci

* format

* fix

* Update test_example_cast.py

* Update parameters in test_example_cast

* Update test_example_flash_attention.py

* update

* format

* fix

* fix

* format

* [Bugfix] Fix tvm import path for editable build (#1172)

* [Language] Expose `T.warpgroup_fence_operand` for nvcc code motion (#986)

* remove debug print

* pipeline fix

* use the correct buffer access scope

* rs support

* warp warpgroup_fence_operand

* fix

* fp8 dtype ptx enhance

* mma fix

* TCGEN05 Interface

* tcgen05 support

* rebase

* update

* Enhance TCGEN05 support by adding new intrinsic operations and descriptors. Introduced `ptx_tcgen05_mma_ts` for tensor-memory to shared-memory instructions and `tcgen05_mma_arrive` for signaling barrier completion. Updated existing descriptors and code generation logic to accommodate these changes, ensuring compatibility with new instruction sets. Refactored related allocation functions and improved handling of shared memory descriptors.

* lint fix

* Refactor buffer reference handling in CUDA code generation and update test execution in tilelang. Ensure default annotations for unrolling are set correctly in TIR IR module.

* wgmma fix

---------
Co-authored-by: Zhiwen Mo <zm125@ic.ac.uk>

* [Language] Add Correctness and performance check scripts for V2 (#1174)

* fix

* lint fix

* fix

* lint fix

* fix

* upd

* [Bugfix] Legalize Datatype for mma intrinisc codegen  (#1179)

* fix

* lint fix

* Enhance CUDA code generation by updating register type handling for float data types. Introduced a workaround for TF32 type compatibility and improved the registration of MMA register types for A and B operands.

* [Perf] Add layout and use_tma to boost performance

* [Lint]

* [Note]

---------
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
Co-authored-by: Yuqi Dong <134183314+yyttt6@users.noreply.github.com>
Co-authored-by: Zhiwen Mo <zm125@ic.ac.uk>

298ab480

[Langauge] Support n>256 for v2 (#1182) · b66a93c5

Lei Wang authored Nov 05, 2025

* fix

* lint fix

* fix

* lint fix

* fix

* upd

* support n>256

* Remove unnecessary pass configurations for fast math in MHA forward BHSD latency script.

* lint fix

* lint fix

b66a93c5

[Release] Unify local build scripts to use `cibuildwheel` and reduce size of sdist (#1171) · 354e9aff

Yichen Yan authored Nov 05, 2025

* update exclude in sdist

* reuse cibw workflow in maint

* update

* fix

* fmt

* upload artifacts for [Release] PRs

* dot-prefix version file

* update

354e9aff

[Feat] Add swap like grammar in tuple assignment (#1185) · 055f8500

Kurisu authored Nov 05, 2025

* [Feat] add 2 phase binding to allow swap two var

* Minor update tvm dtype constructor

* fix lint error

055f8500

04 Nov, 2025 4 commits

[Refactor] Improve Python3.9 compatibility for ParamSpec and Self (#1190) · 7d961892

Lei Wang authored Nov 04, 2025

* [Feature] Enhance fill operation to support various buffer types

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.

* lint fix

* [Refactor] Improve Python compatibility for ParamSpec and Self

- Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively.
- Updated type annotations across multiple files to ensure consistent usage of typing features.

* [Update] Require Python 3.9 and enhance type annotations

- Updated the minimum required Python version from 3.8 to 3.9 in `pyproject.toml`.
- Removed references to Python 3.8 in classifiers.
- Changed type annotations from `int | None` to `Optional[int]` in multiple example files for better clarity and compatibility.
- Improved import statements to use `collections.abc` for `Iterable` and `contextlib` for `AbstractContextManager` in relevant files.

* [Refactor] Update import statements to enhance type annotations

- Replaced imports from `typing` with `collections.abc` for `Iterable` and `Mapping` in relevant files to improve compatibility and clarity.
- Updated the caching decorator from `functools.lru_cache` to `functools.cache` for better performance in the C++ compiler retrieval function.
- Adjusted import statements in the language proxy file to maintain consistency in type annotations.

* disable rocm rs nt test.

* lint fix

7d961892

[Feature] Enhance fill operation to support various buffer types (#1189) · a03df604

Lei Wang authored Nov 04, 2025

* [Feature] Enhance fill operation to support various buffer types

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.

* lint fix

a03df604

[Fix] Remove unsupported type params (#1186) · 1768cbef
Kurisu authored Nov 04, 2025
```
* [Fix] Remove type params

* fix lint error

* [Fix] fix dtype new error
```
1768cbef

[CI] [pre-commit.ci] autoupdate (#1183) · 778b97dc

pre-commit-ci[bot] authored Nov 04, 2025

* [CI] [pre-commit.ci] autoupdate

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.1 → v0.14.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.1...v0.14.3

)

* [CI] sync ruff version

---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

778b97dc

03 Nov, 2025 5 commits

[Fix] fix type imcompatible error in #1115 (#1180) · 4ef94f22
Kurisu authored Nov 04, 2025
```
* Fix incompatible floordiv in packed api

* fix lint
```
4ef94f22

[Language] Initial version of tilelang frontend v2 (#1120) · 5f202fe5

Kurisu authored Nov 03, 2025



* tilelang frontend v2

* syntax sugar: defining a local var by annotation

* [Refactor] fix type linting warning like `T.float32`

* Add tl.local_var_init for new tl.float32

* allow passing default argument as function annotation

* allow default arguments as annotation

* fix lint error

* minor fix

* [Refactor] refactor tilelang.jit and tilelang.autotune

* minor fix

* minor fix

* minor fix

* fix metal get function name

* add par_compile impl and tests

* Type consistency on tvm datatype
1. isinstance(tl.float32, tvm.DataType) == True
2. Allow `tl.float32` as function annotations
3. Allow `tl.float32` as argument to be passed to `tl.alloc` or other functions

* fix lint error

* add more warning in frontend

* update tvm version

* Minor fix on tvm_ffi annotations

* add document and examples

* fix lint error

* Simplify index calculations in example_chunk_o_bwd.py

Refactor index calculations for dg_last_fragment assignment.

* minor fix

* lint fix

---------
Co-authored-by: Lei Wang <leiwang1999@outlook.com>
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

5f202fe5

[CI]: Bump actions/upload-artifact from 4 to 5 (#1178) · ba390756

dependabot[bot] authored Nov 03, 2025

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v5

)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

ba390756

[CI]: Bump actions/download-artifact from 5 to 6 (#1177) · 7de095e5

dependabot[bot] authored Nov 03, 2025

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 5 to 6.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v5...v6

)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

7de095e5

[Bugfix] Legalize Datatype for mma intrinisc codegen (#1179) · 7c61d31a

Lei Wang authored Nov 03, 2025

* fix

* lint fix

* Enhance CUDA code generation by updating register type handling for float data types. Introduced a workaround for TF32 type compatibility and improved the registration of MMA register types for A and B operands.

7c61d31a

02 Nov, 2025 4 commits

[Language] Add Correctness and performance check scripts for V2 (#1174) · d99853b6
Lei Wang authored Nov 03, 2025
```
* fix

* lint fix

* fix

* lint fix

* fix

* upd
```
d99853b6

[Language] Expose `T.warpgroup_fence_operand` for nvcc code motion (#986) · aef0a6bb

Lei Wang authored Nov 03, 2025



* remove debug print

* pipeline fix

* use the correct buffer access scope

* rs support

* warp warpgroup_fence_operand

* fix

* fp8 dtype ptx enhance

* mma fix

* TCGEN05 Interface

* tcgen05 support

* rebase

* update

* Enhance TCGEN05 support by adding new intrinsic operations and descriptors. Introduced `ptx_tcgen05_mma_ts` for tensor-memory to shared-memory instructions and `tcgen05_mma_arrive` for signaling barrier completion. Updated existing descriptors and code generation logic to accommodate these changes, ensuring compatibility with new instruction sets. Refactored related allocation functions and improved handling of shared memory descriptors.

* lint fix

* Refactor buffer reference handling in CUDA code generation and update test execution in tilelang. Ensure default annotations for unrolling are set correctly in TIR IR module.

* wgmma fix

---------
Co-authored-by: Zhiwen Mo <zm125@ic.ac.uk>

aef0a6bb

[Bugfix] Fix tvm import path for editable build (#1172) · c85bb3ac
Lei Wang authored Nov 02, 2025

c85bb3ac

[Refactor]: Change the params in pytest to avoid oom error during ci (#1170) · 13bdcd60

Yuqi Dong authored Nov 02, 2025

* [Refactor]: Change the params in pytest to avoid oom error during ci

* format

* fix

* Update test_example_cast.py

* Update parameters in test_example_cast

* Update test_example_flash_attention.py

* update

* format

* fix

* fix

* format

13bdcd60

01 Nov, 2025 1 commit
- [Testing] Move TMA 1D and test for its functionality (#1167) · 5c62d00a
  Zhengju Tang authored Nov 01, 2025
```
* [Testing] Move TMA 1D and test for its functionality

* [Lint]
```
  5c62d00a
31 Oct, 2025 3 commits

[Bugfix] Support 16bits shfl_sync (#1169) · 54d4bd62

Lei Wang authored Oct 31, 2025

* Add type-safe warp shuffle helpers for 16-bit float types in common.h

- Introduced generic passthrough functions for warp shuffle operations: `shfl_xor_sync`, `shfl_down_sync`, `shfl_up_sync`, and `shfl_sync`.
- Added specializations for `cutlass::half_t` and `cutlass::bfloat16_t` to ensure type safety during shuffle operations.
- Updated `reduce.h` to utilize the new shuffle functions, enhancing code clarity and maintainability.

* lint fix

54d4bd62

[Bugfix] Enable code lowering with producer‑copy‑only program (#1168) · 7a80b6df

Lei Wang authored Oct 31, 2025

* bugfix

* lint fix

* Enhance warp group register allocation to handle missing consumer bodies gracefully. Updated logic to annotate producer side when consumer is absent, ensuring robustness in degenerate warp-specialized patterns.

* Refactor VisitExpr_ method in inject_tma_barrier.cc for improved readability. Adjusted formatting and spacing for clarity in barrier handling logic.

* Update barrier handling in inject_tma_barrier.cc to accommodate newly appended entries. Adjusted the size of the replace vector to ensure it covers the full needed length, and modified the logic for appending barriers based on the updated replace conditions.

7a80b6df

[FFI] Rebase tvm to v0.22.0 to utilize tvm-ffi (#1108) · 10911e28

Lei Wang authored Oct 31, 2025



* 3rdparty tvm bump

* bump tvm into v0.22.0

* lint fix

* rebase tvm

* Update submodule tvm to latest commit 3085bc4

* Refactor: Update configuration retrieval in CopyNode and adjust test registration in tilelang

* test fix

* add requirement

* atomic_fix

* atomic_fix

* phaseout py39

* optimize

* optimize

* lint fix

* do not clean cache

* do not clean cache

* [Minor] Minor update for Python versions and dependencies

* [Lint] fix lint for py39

* [Lint] fix lint for ROCm

* [Build][CI] Sync CI changes from upstream/sdist

* [Lint] fix lint for ROCm

* [Build][CI] Update `repair-wheel-command`

* [Minor] update abi3audit result format

* [Lint] fix lint for ROCm

* [BugFix] fix build

* [Lint] fix lint for ROCm

* [BugFix] set rpath for libtvm and libtvm_runtime

* [Deps] pin apache-tvm-ffi version

* [Build] set Python 3.9 Limited API for Cython target

* [Build] set Python 3.9 Limited API for Cython target

* [Deps] Restore Python 3.8 support

* [Build] use `apache-tvm-ffi`'s `libtvm_ffi`

* [BugFix] use `;` as delimiter for RPATH on macOS

* [BugFix] use `--ignore-missing-dependencies` for `delocate-wheel`

* [Build] support `sccache` if available

* [Build] add CIBW import test

* [Build][CI] enable ccache for CIBW on Linux

* [BugFix] set rpath for libtvm and libtvm_runtime

* Revert "[Build][CI] enable ccache for CIBW on Linux"

This reverts commit cd9ab57bb5ddd2572c60bcbbebde81480a658fd3.

* [CI] fix perfbench bot

* [BugFix] use Python 3.9 to build wheel

* [Minor] update perfbench bot envs

* [BugFix] fix CIBW environment on Linux

* [CI] skip import test on CentOS 7

* [CI] use Python urllib to download file instead of Wget

---------
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

10911e28