Commits · 9c21586b54cde5dd7449f5471502983b5f66d9c7 · OpenDAS / tilelang

17 Dec, 2025 1 commit
- [Feat] Integrate Z3 in TVM Arith Analyzer (#1367) · 9c21586b
  Kuris authored Dec 17, 2025
  
  9c21586b
13 Dec, 2025 1 commit

[CUDA] Add read-only parameter annotation for CUDA codegen (#1416) · 00dd7388

Lei Wang authored Dec 14, 2025

* [Enhancement] Add read-only parameter annotation for CUDA codegen

* Introduced the `AnnotateReadOnlyParams` transformation to annotate read-only handle parameters in PrimFuncs, enabling the generation of `const` qualifiers in CUDA codegen.
* Updated `PrintFunctionSignature` and `AddFunction` methods to utilize the new attribute `tl.readonly_param_indices`, enhancing performance by allowing read-only cache loads.
* Modified the optimization pipeline to include the new annotation step, improving the overall efficiency of the code generation process.

* lint fix

* [Dependency] Update apache-tvm-ffi version to >=0.1.3

* Updated the version of apache-tvm-ffi in pyproject.toml, requirements.txt, and requirements-dev.txt to ensure compatibility with the latest features and fixes.
* Made adjustments in CUDA and HIP template files to use `const` qualifiers for global pointer parameters, enhancing code safety and clarity.

* lint fix

* [Enhancement] Refactor ReadWriteMarker for improved parameter handling

* Updated the ReadWriteMarker class to accept a set of parameter or data variables, enhancing its ability to track written variables.
* Introduced a new method, ResolveDataVarFromPtrArg, to resolve underlying buffer data from pointer-like arguments, improving accuracy in identifying written variables.
* Modified the MarkReadOnlyParams function to gather handle parameters and their corresponding buffer data variables, streamlining the process of determining read-only parameters.
* Enhanced the logic for identifying written variables to account for aliased data variables, ensuring comprehensive tracking of modifications.

* lint fix

* Update tma_load function to use const qualifier for global memory pointer

* Changed the parameter type of gmem_ptr in the tma_load function from void* to void const* to enhance type safety and clarity in memory operations.
* This modification ensures that the function correctly handles read-only global memory pointers, aligning with best practices in CUDA programming.

* Remove commented-out code and reorder transformations in OptimizeForTarget function for clarity

* Refactor buffer marking logic in annotate_read_only_params.cc to improve accuracy in identifying written variables. Update OptimizeForTarget function to reorder transformations for better clarity.

00dd7388

11 Dec, 2025 1 commit

[Dependency] Update apache-tvm-ffi version to >=0.1.2 (#1400) · 0eb33f28

Lei Wang authored Dec 11, 2025

* [Dependency] Update apache-tvm-ffi version to >=0.1.2 in project files

* [Dependency] Update subproject commit for TVM to latest version afc07935

* [Enhancement] Add support for optional step parameter in loop constructs

- Updated loop creation functions to accept an optional step parameter, enhancing flexibility in loop definitions.
- Modified ForFrame implementations to utilize the new step parameter across various loop types including serial, parallel, and pipelined loops.
- Adjusted related vectorization transformations to accommodate the step parameter, ensuring consistent behavior in loop vectorization processes.

* lint fix

0eb33f28

05 Nov, 2025 1 commit

[SM70] Refactor and minor fix for SM70 (#1195) · 4a9cb470

Lei Wang authored Nov 06, 2025

* [Feature] Add support for SM70 tensor core MMA instructions

- Introduced new intrinsic `ptx_mma_sm70` for Volta GPUs, enabling m16n16k4 shape with FP16 inputs and FP16/FP32 accumulation.
- Added `GemmMMASm70` class for handling GEMM operations specific to SM70 architecture.
- Implemented layout functions for Volta swizzled layouts and updated existing GEMM layout inference logic.
- Updated `requirements-dev.txt` to include `apache-tvm-ffi` dependency.
- Added correctness evaluation script for testing GEMM operations on SM70.

* [Refactor] Update formatting and installation commands in scripts

- Modified `format.sh` to install `pre-commit` and `clang-tidy` with the `--user` flag for user-specific installations.
- Improved readability in `correctness_evaluation_sm70.py` by adjusting the formatting of pytest parameters.
- Cleaned up spacing and formatting in various C++ source files for better consistency and readability.
- Removed unnecessary comments and improved layout function definitions in `mma_sm70_layout.py` and `mma_sm70_macro_generator.py` for clarity.
- Ensured consistent formatting in layout initialization and swizzle functions.

* typo fix

4a9cb470

15 Oct, 2025 1 commit

[CI][Refactor] Merge test CI workflow files into one (#973) · 8ce27782

Xuehai Pan authored Oct 15, 2025

* refactor: merge test CI workflow files into one

* chore: set `UV_INDEX_STRATEGY=unsafe-best-match`

* feat: add AST test with Python 3.8

* feat: implement manual caching mechanism for self-hosted runners

* refactor: simplify cache logic for self-hosted runners

* chore: clear uv cache on failure

* chore: print format.sh output to logs

* chore: improve uv caching

* chore: disable parallel test

* chore: use `PYTHONDEVMODE=1` in CI

* feat: enable coredump generation

* fix: fix perfbench condition

* Revert "feat: enable coredump generation"

This reverts commit c52da65cb572932e09905d08c43a39ec3cf47c54.

* chore: move example CI down

* Revert "chore: move example CI down"

This reverts commit 9d8e65055e01d955c5268a9a6705d270c2de0d57.

* chore: skip example `test_example_mha_sink_bwd_bhsd`

* chore: skip example `test_example_gqa_sink_bwd_bhsd`

* fix: fix example argument passing

* fix: loosen test criteria

* chore: rename `CMAKE_CONFIG...

8ce27782

14 Oct, 2025 1 commit

[Build] Prefer libs from local build dir (#1027) · 0f515b86

Yichen Yan authored Oct 14, 2025

* Load libs from build dir, if present, to support faster rebuild.

* typo

* upd

* refine check

* md lint

0f515b86

13 Oct, 2025 1 commit

[Build] Migrate to scikit-build-core (#939) · d89ba5b8

Yichen Yan authored Oct 13, 2025



* cleanup

* init

* build first wheel that may not work

* build cython ext

* fix tvm build

* use sabi

* update rpath to support auditwheel

* pass editible build

* update ci

* fix warnings

* do not use ccache in self host runner

* test local uv cache

* test pip index

* update lib search to respect new lib location

* fix

* update ci

* enable cuda by default

* update src map

* fix

* fix

* fix

* Generate version with backend and git information at build time

* copy tvm_cython to wheels

* fix tvm lib search

* fmt

* remove unused

* auto detect ccache

* add back backend-related files

* remove jit cython adaptor to simplify code

* fmt

* fix ci

* ci fix 2

* ci fix 3

* workaround metal

* ci fix 4

* fmt

* fmt

* Revert "ci fix 4"

This reverts commit d1de8291c3e40927955f3ad3cf87a75c78813676.

* tmp

* fix metal

* trivial cleanup

* add detailed build-time version for cuda

* add back mlc

* Restore wheel info and other trivial updates

* update

* fix cuda

* upd

* fix metal ci

* test for ga build

* test for nvidia/cuda

* test ubuntu 20

* fix

* fix

* Do not use `uv build`

* fix

* fix

* log toolchain version

* merge wheel

* update

* debug

* fix

* update

* skip rocm

* update artifacts each

* fix

* fix

* add mac

* fix cache

* fix cache

* fix cache

* reset and add comment

* upd

* fix git version

* update deps

* trivial update

* use in-tree build dir and install to src to speedup editable build

* Revert "use in-tree build dir and install to src to speedup editable build"

This reverts commit 6ab87b05c5eed811210136b8dca4fc3677dd51f2.

* add build-dir

* update docs

* remove old scrips

* [1/n] cleanup scripts

* [Lint]: [pre-commit.ci] auto fixes [...]

* fix and update

* wait for tvm fix

* revert some tmp fix

* fix

* fix

* spell

* doc update

* test cibuildwheel

* fix and test macos on ci

* Update .github/workflows/dist.yml
Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>

* fix

* test ga event

* cleanup

* bump tvm to support api3

* test final version

* add cron

* Update .github/workflows/dist.yml
Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>

* fix

* test ccache for metal cibuildwheel

* test newer macos

* finish

---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>

d89ba5b8

08 Aug, 2025 1 commit

[Layout] Introduce a new layout inference mechanism (#699) · 407117e1

Lei Wang authored Aug 08, 2025



* Implement new free stage layout inference.

* Fix bug

* Make replication upcasting and unnormalizable iterators safe.

* Better handling of updating with more replica

* Remove unnecessary check.

* Fix compilation.

* Fix setup.py.

* Simplify development mode.

* Allow ParallelOp layout when there's already a compatible layout specified

* lint fix

* Add ProveFragmentContains function to validate thread access between small and large fragments

This function checks if the threads accessing elements of a smaller fragment are a subset of those accessing a larger fragment, ensuring valid access during updates. The implementation includes deriving thread indices, computing logical indices, and verifying thread mappings.

* Update dependencies in requirements files

* Remove 'thefuzz' from requirements-dev.txt
* Specify exact versions for 'torch' and add 'flash_attn' in requirements-test.txt

* Update CI workflow to use SHA256 hash for requirements file

* Update requirements and CI workflow for flash attention

* Removed specific version for 'torch' in requirements-test.txt
* Added installation of 'flash_attn==2.5.8' in CI workflow to ensure compatibility

* Refactor flash attention import handling in examples

* Removed availability checks for 'flash_attn' in multiple example scripts.
* Simplified import statements for 'flash_attn' to ensure consistent usage across examples.

---------
Co-authored-by: Huanqi Cao <caohuanqi@deepseek.com>

407117e1

19 Apr, 2025 1 commit

[Refactor] Phaseout python dependency `attrs` and `decorator` (#411) · 5c7e2fa8

Lei Wang authored Apr 20, 2025

* Phase out attr

* Remove unused dependencies from requirements files

* Update TVM submodule to latest commit 4776d31

5c7e2fa8

22 Feb, 2025 1 commit

[Wheel] Provide a bare docker scripts to help build wheels for manylinux (#105) · b4bd2a56

Lei Wang authored Feb 22, 2025

* [Build] Improve build configuration and package distribution support

- Add `build` to requirements-build.txt for package building
- Update MANIFEST.in to include Cython wrapper source file
- Enhance setup.py to improve Cython file copying logic
- Update build scripts to support multi-Python version distribution

* [Build] Improve Cython file handling in setup.py and MANIFEST.in

- Remove Cython wrapper from MANIFEST.in
- Enhance setup.py to create target directory if it doesn't exist when copying Cython files
- Improve file copying logic for Cython source files during build process

* [Build] Remove Cython file copying logic in setup.py

- Comment out Cython file copying code in TileLangBuilPydCommand
- Simplify setup.py build process by removing redundant Cython file handling

* [Build] Enhance Docker distribution scripts for multi-Python version support

- Refactor local and PyPI distribution Docker scripts
- Replace hardcoded Python installation with Miniconda-based multi-version Python environment
- Improve Docker image setup with dynamic Python version creation
- Simplify build process by using Miniconda for Python environment management

* [Build] Separate lint requirements into a dedicated file

- Create new requirements-lint.txt for formatting and linting tools
- Update format.sh to use requirements-lint.txt instead of requirements-dev.txt
- Update requirements-dev.txt and requirements-test.txt to reference requirements-lint.txt
- Improve dependency management by isolating lint-specific requirements

* [Build] Restore Cython file copying logic in setup.py

- Re-add Cython file copying mechanism in TileLangBuilPydCommand
- Implement robust file search across multiple potential directories
- Add warning for cases where Cython source files cannot be found
- Improve build process reliability for Cython source files

* [Build] Refactor Cython file copying logic in setup.py

- Simplify Cython file copying mechanism in TileLangBuilPydCommand
- Improve directory creation and file copying for Cython source files
- Relocate potential directories list to a more logical position
- Enhance robustness of file and directory handling during build process

* [Build] Refine Cython file copying logic in setup.py

- Improve file existence check when copying Cython source files
- Use os.path.join to construct full path for more robust file checking
- Enhance file copying mechanism in TileLangBuilPydCommand

b4bd2a56

10 Feb, 2025 1 commit

[Dev] Remove unnecessary python dependencies (#69) · 2411fa28

Lei Wang authored Feb 10, 2025

* [Enhancement] Add VectorizeLoop function and update imports for compatibility

* [CI][Test] Improve test cases for vectorization and fix typos in parser comments

* lint fix

* Fix incorrect module reference for VectorizeLoop transformation

* Refactor vectorize_loop transformation by removing unused extent mutation logic

* [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen

* Fix formatting in CUDA FP8 header file for consistency

* Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity

* Update submodule 'tvm' to latest commit for improved functionality

* Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.

* Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.

* Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency

* Add CUDA requirements to FP8 test cases and update references for clarity

* Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py

* Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py

* Add CUDA requirements and FP8 test cases for matmul and gemv simulations

* Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py

* Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py

* Add BF16 support to matrix multiplication and introduce corresponding test cases

* Add a blank line for improved readability in BF16 GEMM test

* Update acknowledgements in README to include supervision by Zhi Yang at Peking University

* enhance acknowledgement

* Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives

* Update subproject commit for TVM dependency

* Update subproject commit for TVM dependency

* Add int4_t type and functions for packing char values in CUDA common header

* Add plot_layout example and implement GetForwardVars method in layout classes

* Refactor code for improved readability by adjusting line breaks and formatting in layout and test files

* Fix formatting by removing unnecessary line break in layout.h

* Refactor make_int4 function for improved readability by adjusting parameter formatting

* Add legend to plot_layout for improved clarity of thread and local IDs

* Remove unnecessary dependencies from requirements files for cleaner setup

* Remove flash_mha.py and add .gitkeep to deepseek_mla directory

* Add build requirements and update installation scripts for improved setup

2411fa28

25 Jan, 2025 1 commit

[CI][Test] Add test cases for tilelang kernel FlashAttention (#54) · bedab1a0

Yu Cheng authored Jan 26, 2025

* [Dev] Add FlashDecoding example

* [CI][Test] Add test cases for tilelang kernel convolution

* [CI][Test] Add test cases for tilelang kernel FlashAttention

* Reduce the number of stages to ensure the shared memory allocation is valid

* Temporarily remove the dim128 case

* lint

* update einops in requirements-dev.txt

* update einops in requirements-test.txt

* remove einops in requirements-dev.txt

bedab1a0

11 Jan, 2025 1 commit

[Initialization] Migration of Codebase from Dev Branch into Main (#10) · 57ab687c

Lei Wang authored Jan 11, 2025



* Add format.sh script for code formatting and linting

* docs update

* center align the title

* lint fix

* add ignore

* Add .gitignore for 3rdparty directory

* Add requirements-dev.txt, requirements-test.txt, and requirements.txt

* 3rdparty

* Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h

* Refactor CMakeLists.txt and include statements

- Update CMakeLists.txt to use a newer version of CMake and add project name
- Remove unnecessary include directories

Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc

- Update include paths to use relative paths instead of absolute paths

* Update submodule for 3rdparty/tvm

* update

* load dll first

* Refactor CMakeLists.txt and include statements

* Refactor CMakeLists.txt and include statements

* git keep update

* Refactor CMakeLists.txt and include statements

* Refactor CMakeLists.txt and include statements

* refactor code structure

* Update Readme

* CMakeLists Customized

* update readme

* update README

* update readme

* update usage

* with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import

* annotate lower transform global func with `transform` prefix

* Migrate Simplify Pass from tilelang tvm branch

* enhance system environment handling with __init__ and CMake

* Initial commit

* CODE_OF_CONDUCT.md committed

* LICENSE committed

* README.md committed

* SECURITY.md committed

* SUPPORT.md committed

* CODE_OF_CONDUCT Commit

* LICENSE Commit

* SECURITY Commit

* SUPPORT Commit

* Modify Support

* Update README.md

* security ci update

* remove examples

* Update and implement clang-format

* add composable kernel components

* Migrate from latest update

* submodule update

* Test update

* Update License

* Spell check

* lint fix

* add clang-tidy to apply static analysis for c source

* update tilelang examples

* Update Install Docs

* Refactor filetree

* Enhance Install

* conflict resloved

* annotate_version

* Initial Update

* test fix

* install

* Implement setup.py

* lint fix

* Separate Init

* Separate test

* docker file commit

* add logo

* Update Readme and Examples

* update readme

* update logo

* Implement AMD Installation

* Add License

* Update AMD MI300x Benchmark

* update README

* update mi300 benchmark scripts

* update ignore

* enhance build scirpt

* update image

* enhance setup.py to remove duplicated libraries

* remove debug files

* update readme

* update image

* update gemm examples

* update flashattention README

* readme update

* add cmake into requirements

* libinfo fix

* auto update submodule

* lint fix

* Fix AMD Build and Test

* Update check for transpose attribute for CDNA Arch

* typo fix for amd

* Implement Matmul Benchmark

* Refactor Code

* [TypoFix] Fix GEMM Example

* [Docs] Init Linear Attention README

* [TYPO] Typo fix

* [Lint] Lint Fix

* enhance example with intrinsics

* [Enhancement] Improve Buffer Collection during IR Parser

* [Dev] Introduce Current classmethod to get current frame

* submodule update

* fake test pass update

* support thread_extent_api

* code optimize

* Add GEMM function implementation for matrix multiplication

* Update logging format to reflect TileLang in logger messages

* Refactor CMakeLists.txt for improved readability and set default build type to Release

* Support Gemm SS Primitives Implementation

* [README] Upload Tile Language Logo (#5)

* update logo

* Update README.md to enhance formatting and center the title

---------
Co-authored-by: microsoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com>
Co-authored-by: Microsoft Open Source <microsoftopensource@users.noreply.github.com>
Co-authored-by: Yu Cheng <yu.cheng@pku.edu.cn>

57ab687c