Commits · 29051439dbed90583bfad1d16dfca88a95e78709 · OpenDAS / tilelang

12 Dec, 2025 5 commits

[Lint] Phaseout Yapf format and embrace ruff format (#1417) · 29051439
Lei Wang authored Dec 12, 2025

29051439

[Enhancement] Improve vectorization invariant check (#1398) · e84b24bc

Xiangwen Wang authored Dec 12, 2025

* Improve loop vectorize

* Improve loop vectorize

* Improve loop vectorize

* Improve loop vectorize

* Improve loop vectorize

* Add some vectorize tests and comments

e84b24bc

[Enhancement] Introduce `T.__ldg` (#1414) · 6f67da84

Lei Wang authored Dec 12, 2025

* [Enhancement] Add __ldg intrinsic for CUDA read-only cache loads

* Introduced the __ldg intrinsic to enable explicit read-only cached loads from global memory in CUDA.
* Updated the corresponding documentation and added support in both CUDA and HIP code generation.
* Enhanced the Python interface for __ldg to accept BufferLoad and Buffer types, improving usability.

* [Enhancement] Update formatting and linting rules in pyproject.toml; minor test adjustment

* Added new formatting rules in pyproject.toml to enforce consistent code style, including hanging indents and argument splitting.
* Updated test_tilelang_language_intrinsics_codegen.py to improve readability by adding a blank line before the main execution block.
* Refactored error messages in builtin.py for better clarity and consistency, ensuring proper formatting in function definitions and raising ValueErrors.

* lint fix

6f67da84

[Dependency] Update TVM subproject to latest commit 2b1ead1a (#1412) · 34632a1b
Lei Wang authored Dec 12, 2025

34632a1b

[Dependency] Add torch-c-dlpack-ext to project requirements (#1403) · ba2c1856

Lei Wang authored Dec 12, 2025



* [Dependency] Add torch-c-dlpack-ext to project requirements

* Added torch-c-dlpack-ext to both pyproject.toml and requirements.txt to provide prebuilt torch extensions, which may prevent JIT compilation on first import of TVM FFI.

* [Build] Update manylinux images in project configuration

* Changed the manylinux image for x86_64 from "manylinux2014" to "manylinux_2_28" in both pyproject.toml and the Dockerfile to align with updated standards for compatibility and performance.

* [Build] Update CUDA repository configuration in pyproject.toml

* Changed the package manager command from `yum-config-manager` to `dnf config-manager` for adding the CUDA repository, ensuring compatibility with newer systems.

* fix

* [Build] Update CUDA repository to RHEL 8

* Changed the CUDA repository configuration in both pyproject.toml and the manylinux Dockerfile from RHEL 7 to RHEL 8, ensuring compatibility with newer systems.

* test: run out of space

* use cu130 to reduce size

* upd

* upd comment

* upd

---------
Co-authored-by: Your Name <wenji.yyc@alibaba-inc.com>

ba2c1856

11 Dec, 2025 5 commits

[Doc] Minor documentation update (#1410) · 08262bce
Lei Wang authored Dec 12, 2025

08262bce
[TypoFix] fix typo for SM120 (#1408) · ede9eaa3
Cunxiao Ni authored Dec 11, 2025

ede9eaa3
[AMD] Enable FA2 fwd on AMD MI300X (#1406) · 53be59dc
danielhua23 authored Dec 11, 2025
```
* enable FA2 on AMD MI300X

* make lint happy
```
53be59dc

[Dependency] Update apache-tvm-ffi version to >=0.1.2 (#1400) · 0eb33f28

Lei Wang authored Dec 11, 2025

* [Dependency] Update apache-tvm-ffi version to >=0.1.2 in project files

* [Dependency] Update subproject commit for TVM to latest version afc07935

* [Enhancement] Add support for optional step parameter in loop constructs

- Updated loop creation functions to accept an optional step parameter, enhancing flexibility in loop definitions.
- Modified ForFrame implementations to utilize the new step parameter across various loop types including serial, parallel, and pipelined loops.
- Adjusted related vectorization transformations to accommodate the step parameter, ensuring consistent behavior in loop vectorization processes.

* lint fix

0eb33f28

[Typo] Fix tilelang link in README.md (#1402) · 79d381d1
senlyu163 authored Dec 11, 2025

79d381d1

10 Dec, 2025 4 commits

[AMD] Fix 3 bugs when build docker on amd mi3x gpu (#1401) · d19142f6
danielhua23 authored Dec 10, 2025

d19142f6

[Enhancement] Refactor inflight computing to support dynamic pipeline extents (#1399) · f2858fa1

Lei Wang authored Dec 10, 2025

* [Build] Update CMake configuration for tilelang_cython_wrapper installation

- Adjusted output directories for the tilelang_cython_wrapper to ensure that development builds place the extension in build/lib.
- Updated installation paths to place the extension in tilelang/lib within the wheel, improving organization and avoiding potential conflicts with other modules.
- Modified the internal library path exposure in env.py to prevent shadowing of common module names, enhancing compatibility and usability in user projects.

* [Build] Standardize output directories for tilelang libraries

- Set output directories for both tilelang and tilelang_module libraries to "${CMAKE_BINARY_DIR}/lib" for consistency in development builds.
- This change enhances organization and ensures that all build artifacts are located in a unified directory structure.

* [Refactor] Update TVM subproject and enhance pipeline loop handling

- Updated the TVM subproject to commit 90581fe9e5287bbcf1844ad14255a1e1e8cdf7f0.
- Added new fields to `PipelineAnnotation` and `RewrittenBlockInfo` structures to track original statement indices and improve async state management.
- Refactored `EmitImpl` and `PopulateWaitCounts` methods to enhance clarity and functionality, including better handling of commit groups and wait counts.
- Simplified access index calculations and strengthened analyzer constraints for loop bounds.

* [Cleanup] Remove license block and unused includes from inject_pipeline.cc

- Eliminated the Apache license block from the top of the file to streamline the code.
- Removed unused include directives for memory and stringstream to enhance code clarity and reduce unnecessary dependencies.

* [Refactor] Enhance transformation pipeline and test execution

- Added an additional Simplify transformation in the InjectSoftwarePipeline to improve optimization.
- Updated the test file to call `test_trival_pipeline()` directly, commenting out the previous main execution for better test isolation.

f2858fa1

[Doc] Update logging docs (#1395) · bc084aa4
Chaofan Lin authored Dec 10, 2025

bc084aa4
[Enhancement] Add debug output methods for Layout and Fragment classes (#1392) · e7e4e65b
Kuris authored Dec 10, 2025

e7e4e65b

08 Dec, 2025 2 commits

[BugFix] Fix split kernel layout bug of GQA decode (#1386) · 242b43bb

Zhengju Tang authored Dec 08, 2025

* [BugFix] Fix split kernel layout bug of GQA decode

* [BugFix] Avoid local with Parallel; use robust fragment instead

242b43bb

[Bugfix][Build] Update CMake configuration to remove project root injection for sys.path (#1385) · d933d65b

Lei Wang authored Dec 08, 2025

* [Build] Update CMake configuration for tilelang_cython_wrapper installation

- Adjusted output directories for the tilelang_cython_wrapper to ensure that development builds place the extension in build/lib.
- Updated installation paths to place the extension in tilelang/lib within the wheel, improving organization and avoiding potential conflicts with other modules.
- Modified the internal library path exposure in env.py to prevent shadowing of common module names, enhancing compatibility and usability in user projects.

* [Build] Standardize output directories for tilelang libraries

- Set output directories for both tilelang and tilelang_module libraries to "${CMAKE_BINARY_DIR}/lib" for consistency in development builds.
- This change enhances organization and ensures that all build artifacts are located in a unified directory structure.

d933d65b

07 Dec, 2025 2 commits

[Typing] Enhance compatibility for advanced typing features in Python (#1382) · 305c854b

Lei Wang authored Dec 07, 2025

- Updated `allocate.py` and `annot.py` to improve compatibility with Python 3.9 and later by conditionally importing advanced typing features such as `TypeVarTuple`, `Unpack`, and `ParamSpec`.
- Added fallback imports from `typing_extensions` for environments using earlier Python versions.
- Improved handling of generic alias detection to ensure consistent behavior across different Python versions.

305c854b

[Release] Bump Version into 0.1.7 (#1377) · ce16e479

Lei Wang authored Dec 07, 2025

* Update VERSION to 0.1.7

* Update Python version in distribution scripts to support CPython 3.9 and log output

ce16e479

06 Dec, 2025 8 commits

[Language V2] Minor fix for complex annotations (#1381) · 6021f863
Lei Wang authored Dec 07, 2025

6021f863

[Fix] typo in cuda attr (#1380) · 8f50c122

Yunqian Fan authored Dec 07, 2025

* [Bugfix] make cuda driver api compat with cuda12/13, along with tests

* fix typo in cudaDevAttr

8f50c122

[Bugfix] make cuda driver api compat with cuda12/13, along with tests (#1379) · a407c4a9
Yunqian Fan authored Dec 07, 2025

a407c4a9

[Builder] Enhance variable name binding and scope management (#1378) · 3f8e6b59

Lei Wang authored Dec 07, 2025

- Improved handling of TVM Var/Buffer names to prevent out-of-scope errors when reusing Python names across different for-frames.
- Added assertions to ensure variables are defined within the correct control flow frame, enhancing error checking and code reliability.

3f8e6b59

[Language] Tilelang LazyJIT Experimental Version (#1337) · 0921328d

Kuris authored Dec 06, 2025



* initial step

* modify builder

* scratch version of new frontend

* write some tests

* add many tests

* add typing stub for tir.ir

* remove idents

* minor update

* minor update

* First version of jitv2 (renamed to LazyJIT)

* fix pre-commit error

* minor fix

* fix lint error

* fix lint error

* Fix conditional check for PrimFunc instance

---------
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

0921328d

[Release] Relax constraint of tvm-ffi to compatible version (#1373) · 8d019eb9
Yichen Yan authored Dec 06, 2025
```
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>
```
8d019eb9

[Tool] Provide layout visualization tool (#1353) · 924225ed

Cunxiao Ni authored Dec 06, 2025

* Provide layout visualization tool

Adds a layout visualization tool to TileLang, which helps users understand and debug the layout transformations applied during compilation.

This tool visualizes the memory layout of tensors at different stages of the compilation process, allowing developers to identify potential inefficiencies and optimize their code for better performance.

The visualization can be enabled via a pass config option.

* format

* add layout visual example

* Adds vis extra with matplotlib dependency

* rafactor pass config name

* fix lint

* Enables configurable layout visualization formats

Allows users to specify the output formats (png, pdf, svg) for layout visualization through a pass config option.

This change provides more flexibility in how layout visualizations are generated, allowing users to choose the formats that best suit their needs.

It also fixes a bug where layout visualization was not correctly disabled when the confi...

924225ed

[Enhancement] Introduce buffer var lca analysis for pass plan buffer allocations (#1376) · f8e7fef5

Lei Wang authored Dec 06, 2025

* Update submodule TVM to latest commit and add PlanAndUpdateBufferAllocationLocation function to transform module

- Updated the TVM submodule to commit 3a32b763.
- Added a new function `PlanAndUpdateBufferAllocationLocation` in the transform module to facilitate buffer allocation planning within PrimFuncs.

* Refactor buffer allocation code for improved readability and consistency

- Updated formatting and spacing in `plan_update_buffer_allocation_location.cc` for better code clarity.
- Standardized the use of pointer and reference syntax across various class methods.
- Enhanced comments for better understanding of buffer allocation logic.
- Removed unnecessary lines and improved overall code structure.

* Refactor buffer allocation checks for improved clarity

- Replaced size checks with empty checks for `ffi::Array<Buffer>` in `plan_update_buffer_allocation_location.cc` to enhance code readability.
- Updated conditions in multiple methods to use `empty()` instead of comparing size to zero, streamlining the logic.

f8e7fef5

05 Dec, 2025 1 commit

[Layout] Enhance Free Layout Inference (#1375) · 6654064d

Lei Wang authored Dec 05, 2025

* [Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py

* [Enhancement] Extend support for float8 data types in GEMM operations

- Updated GEMM operations to recognize additional float8 data types: `float8_e4m3fn` and `float8_e5m2fnuz`.
- Refactored condition checks in `checkWgmma` methods to simplify float8 type handling.
- Adjusted test cases to ensure compatibility with the new float8 types in tile language examples.

* lint fix

* [Enhancement] Add injective layout detection and exception handling

- Introduced `DetectInjective` method in `FragmentNode` to check for injective layouts.
- Added `LoopLayoutInjectiveException` to handle errors related to non-injective layouts.
- Updated `InferLayout` methods in `ParallelOpNode` to utilize injective checks and log relevant information.
- Refactored layout inference queue management to use `std::deque` for improved performance and added prioritization logic for buffer layouts.

* remove debug print

* minor layout fix

* fix for T.view

* [Enhancement] Improve injective layout detection in FragmentNode

- Updated the `DetectInjective` method to handle symbolic dimensions more effectively by introducing a mechanism to collect symbolic shapes and adjust the detection level accordingly.
- Added logging for cases where the layout detection falls back to NoCheck due to symbolic dimensions.
- Minor update to the test file to include the tilelang testing module.

* [Refactor] Simplify layout inference for bulk copy operations

- Removed unnecessary conditions for bulk load/store operations in the layout inference logic.
- Streamlined the handling of layout application for bulk copy instances to enhance clarity and maintainability.

* remove debug print

* [Enhancement] Introduce layout-related exceptions and improve error handling

- Added `LayoutConflictException` and `LoopLayoutInjectiveException` classes for better exception management in layout operations.
- Updated `InferLayout` method in `ParallelOpNode` to throw `LoopLayoutInjectiveException` with detailed error information when injective layout checks fail.
- Removed redundant exception class definitions from `parallel.h` to streamline code organization.

6654064d

03 Dec, 2025 2 commits

[Refactor] Generalize fp8 process (#1372) · 92121fc6

Lei Wang authored Dec 03, 2025

* [Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py

* [Enhancement] Extend support for float8 data types in GEMM operations

- Updated GEMM operations to recognize additional float8 data types: `float8_e4m3fn` and `float8_e5m2fnuz`.
- Refactored condition checks in `checkWgmma` methods to simplify float8 type handling.
- Adjusted test cases to ensure compatibility with the new float8 types in tile language examples.

* lint fix

92121fc6

[Refactor]: Remove useless include in atomicadd_vectorize.h (#1371) · 1da3debf
Yuqi Dong authored Dec 03, 2025

1da3debf

02 Dec, 2025 5 commits
- [Enhancement] Add DISABLE_CACHE environment variables (#1368) · 422fb129
  Chaofan Lin authored Dec 02, 2025
  
  422fb129
- [Refactor] Update condition for benchmarking in example_gemv.py and simplify... · 6501bd07
  Lei Wang authored Dec 02, 2025
```
[Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py (#1365)
```
  6501bd07
- [Debug] Always include line info in NVCC command for improved profiling and mapping (#1364) · d88594a3
  Lei Wang authored Dec 02, 2025
  
  d88594a3
- [Bugfix] Remove debug print in PyStmtFunctionVisitor (#1363) · f951b924
  Lei Wang authored Dec 02, 2025
  
  f951b924
- [CI] [pre-commit.ci] autoupdate (#1362) · e37f2eab
  pre-commit-ci[bot] authored Dec 02, 2025
```
updates:
- [github.com/pre-commit/mirrors-clang-format: v21.1.2 → v21.1.6](https://github.com/pre-commit/mirrors-clang-format/compare/v21.1.2...v21.1.6)
- [github.com/astral-sh/ruff-pre-commit: v0.14.3 → v0.14.7](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.3...v0.14.7

)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
```
  e37f2eab
01 Dec, 2025 5 commits

[Enhancement] Implement dynamic unroll factor in CUDA code generation (#1360) · 388ee7ee

Lei Wang authored Dec 02, 2025

* [Enhancement] Implement dynamic unroll factor in CUDA code generation

This commit introduces support for specifying a dynamic unroll factor in the CUDA code generation. The `unroll_factor` map is added to store unroll factors for loop variables, allowing for more flexible and optimized loop unrolling. Additionally, the `unroll` function is integrated into the loop language, enabling users to define unroll factors directly in their code. This enhancement improves performance by allowing tailored unrolling strategies based on specific loop characteristics.

* lint fix

* [Bugfix] Correct initialization of non-zero counters in custom compress kernel and update TIR registration for gemm_sp_py to use the correct tile operation

388ee7ee

[Bugfix] Update TIR registration for GemmSPPy to use tile operation (#1361) · e547d247
Lei Wang authored Dec 01, 2025

e547d247

[Language] support `T.gemm_sp_v2` on sm80 and sm89 (#1056) · 283a9a00

botbw authored Dec 01, 2025

* [misc] add a cpp side wrapper for gemm_sp_py

* [misc] typing

* [IR] bind GemmSPWarpPolicy

* [chore] add wrapper code

* [IR] fix GemmSPWarpPolicy

* [codegen] apply ptxas instructions

* [intrinsic] add typical (unused) mma layout

* [template] add uint16 debug func

* [intrinsic] add b matrix layout

* [gemm_sp] enable fp16/bf16 on sm8x

* [layout] refactor fp16/bf16 layout

* [gemm_sp] enable int8

* [chore] update test case dtype

* [gemm_sp] enable fp32

* [layout] refactor layouts

* [intrinsic] enable ldmatrix for mat A

* [layout] enable ldsm for matrix b

* [layout] add ldmatrix for fp32 and fp8

* [chore] refine

* [chore] refactor

* [chore] add fp8 efactor

* [chore] refactor

* [chore] add remove negative zero util

* [example] add a custom compress kernel

* [chore] minor update

* [test] refactor gemm_sp test

* [refactor] make metadata layout func

* [example] add option for using cutlass layout

* [doc] add a gemm_sp doc

* [doc] minor polish

* [chore] remove unused

* [bugfix] fix non replicate b case

* [test] refactor

* [chore] add a check

* [bugfix] fix util bug

* [wip] init a new test case for v2

* [chore] minor refactor

* [chore] minor update

* [bugfix] enable 16bit rs

* [language] enable rs

* [language] enable gemm_sp_sr

* [language] enable gemm_sp_rr

* [test] enable more tests

* [tvm] update ffi binding

* [chore] remove print

* [chore] fix benchmark script

* [lint] precommit lint

* [chore] apply feedback

* [test] use arch 8.0

* [chore] rollback ::ordered_metadata for backward compatibility

* [bugfix] fix captialized

* [example] keep gemm_sp on hopper

* [test] fix no fp8 normal kernel

* [test] reduce matmul size to satisfy accum error

* [test] use cal_diff for assertion

* [bugfix] expand float8 type

* [lib] add make_int4 for short type

* [language] add transpose E

* [bugfix] fix wrong var

* [format] format

* [chore] refactor binding

* [chore] fix wrong passing var

283a9a00

[Analysis] Enhance NestedLoopChecker with tile op cases (#1358) · b10ef75f
Chaofan Lin authored Dec 01, 2025
```
* [Analysis] Enhance NestedLoopChecker with tile op cases

* fix tileop issue
```
b10ef75f

[Refactor] Update Fragment Indexing in ParallelOpNode's InferLayout Method (#1359) · 1b42c87b

Lei Wang authored Dec 01, 2025

This commit refines the Fragment creation process in the InferLayout method of ParallelOpNode. It removes the unnecessary forward_index array and utilizes default fragment indexing for consistency with other operations. Additionally, it binds the thread range to enhance comparability across different operations.

1b42c87b

30 Nov, 2025 1 commit

[Bugfix] Fix the jit_kernel issue (#1357) · c6a19fb2

Leon Lu authored Nov 30, 2025



* [Bugfix] Fix the jit_kernel issue

* Update README.md

---------
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

c6a19fb2