- 31 Oct, 2025 2 commits
-
-
Lei Wang authored
* 3rdparty tvm bump * bump tvm into v0.22.0 * lint fix * rebase tvm * Update submodule tvm to latest commit 3085bc4 * Refactor: Update configuration retrieval in CopyNode and adjust test registration in tilelang * test fix * add requirement * atomic_fix * atomic_fix * phaseout py39 * optimize * optimize * lint fix * do not clean cache * do not clean cache * [Minor] Minor update for Python versions and dependencies * [Lint] fix lint for py39 * [Lint] fix lint for ROCm * [Build][CI] Sync CI changes from upstream/sdist * [Lint] fix lint for ROCm * [Build][CI] Update `repair-wheel-command` * [Minor] update abi3audit result format * [Lint] fix lint for ROCm * [BugFix] fix build * [Lint] fix lint for ROCm * [BugFix] set rpath for libtvm and libtvm_runtime * [Deps] pin apache-tvm-ffi version * [Build] set Python 3.9 Limited API for Cython target * [Build] set Python 3.9 Limited API for Cython target * [Deps] Restore Python 3.8 support * [Build] use `apache-tvm-ffi`'s `libtvm_ffi` * [BugFix] use `;` as delimiter for RPATH on macOS * [BugFix] use `--ignore-missing-dependencies` for `delocate-wheel` * [Build] support `sccache` if available * [Build] add CIBW import test * [Build][CI] enable ccache for CIBW on Linux * [BugFix] set rpath for libtvm and libtvm_runtime * Revert "[Build][CI] enable ccache for CIBW on Linux" This reverts commit cd9ab57bb5ddd2572c60bcbbebde81480a658fd3. * [CI] fix perfbench bot * [BugFix] use Python 3.9 to build wheel * [Minor] update perfbench bot envs * [BugFix] fix CIBW environment on Linux * [CI] skip import test on CentOS 7 * [CI] use Python urllib to download file instead of Wget --------- Co-authored-by:Xuehai Pan <XuehaiPan@pku.edu.cn>
-
Lei Wang authored
* [Release] Update README and VERSION for v0.1.6.post2 compatibility with Python 3.8 * [Enhancement] Update packaging configuration and Docker scripts for multi-architecture support * Add allowlist for TVM, CUTLASS, and Composable Kernel items in pyproject.toml * Enhance docker_local_distribute.sh to support cross-architecture builds using docker buildx * Modify pypi.manylinux.Dockerfile to accept TARGETARCH argument for better architecture handling * [Enhancement] Improve Docker scripts and build process for multi-architecture support * Update .gitignore to include dist directories * Refactor docker_local_distribute.sh for better cross-architecture handling and error management * Enhance docker_pypi_distribute.sh to support multi-architecture builds with docker buildx * Modify pypi_distribution.sh to clean up additional directories * Update pypi.manylinux.Dockerfile for improved environment configuration and architecture handling * fix * Remove outdated classifier for Artificial Intelligence from pyproject.toml * Update pyproject.toml classifiers and modify Docker distribution scripts for clarity * Add new classifier for Artificial Intelligence in pyproject.toml * Rename output directories in docker_local_distribute.sh and docker_pypi_distribute.sh for better context
-
- 29 Oct, 2025 6 commits
-
-
Lei Wang authored
* [Refactor] Enhance TLVectorizer with loop vectorization convenience method and improve let variable handling * lint fix * let test fix * lint fix
-
LJC00118 authored
* Enhance Cast vectorized * Add Parallel vectorized cast test * code lint * merge newest commit
-
Yuqi Dong authored
* update * Update codegen_cuda.cc
-
Cunxiao Ni authored
* [BugFix] Correct direct copy from bf16 to fp8 * fix lint * implement overloaded cast codegen for type conversion * fix lint * remove test * fix lint * trigger CI * Overload fp8 for implicit conversion * format * new format * fix: Reinterpret types to cute types in GEMM * new format * fix lint * new format * fix lint * format * trigger ci --------- Co-authored-by:nicunxiao <nicunxiao@bytedance.com>
-
Xuehai Pan authored
-
Lei Wang authored
* atomic_fix * atomic_fix * mem fix * lint fix * add some comments * fix * fix * lint fix * handle async copy * lint fix * lint fix
-
- 28 Oct, 2025 5 commits
-
-
Lei Wang authored
* atomic_fix * atomic_fix * mem fix * lint fix * add some comments * fix * fix * lint fix * handle async copy * lint fix
-
Tong WU authored
[BugFix] Implement bfloat16 support in CUDA code generation with min/max functions and inf/nan values (#1143) * Implement bfloat16 support in CUDA code generation with min/max functions and inf/nan values * refactor * fix prev typo * bugfix * lint * bugfix
-
Lei Wang authored
-
Kurisu authored
* [Fix] init var with complex expression * fix lint error
-
Jiaxing Ding authored
-
- 27 Oct, 2025 9 commits
-
-
Lei Wang authored
* atomic_fix * atomic_fix
-
Zhengju Tang authored
* [BugFix] Add memory order for split version kernel; Remove torch manual seed * [Lint] Manual
-
LJC00118 authored
* Remove an incorrect check * add fp8 pack function * code lint * minor fix * minor fix * minor fix * Minor fix * Minor fix * add pack function * code lint * code lint
-
Yu Cheng authored
* [Benchmark] Update triton and helion baselines in mamba-chuk-scan * lint * update mamba baseline version
-
Xuehai Pan authored
-
Yuqi Dong authored
* update * update
-
Yu Cheng authored
* [Enhancement] Add missing primitive after mbarrier init * lint
-
dependabot[bot] authored
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 5 to 6. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v5...v6 ) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v5 ) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 25 Oct, 2025 1 commit
-
-
Zhengju Tang authored
* [Feature] Add memory_order PTX for vectorized (2x) atomic add * [Feature] Add memory_order PTX for all vectorized atomic add * [Lint] * test * [BugFix] FIx init optional argument in alloc_var * bug fix * bug fix * lint fix * lint fix --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 24 Oct, 2025 1 commit
-
-
Lei Wang authored
* fix int32 dtype issue * lint fix * lint * lint fix --------- Co-authored-by:Zhiwen Mo <zm125@ic.ac.uk>
-
- 23 Oct, 2025 4 commits
-
-
Wenhao Xie authored
* [Feature] Support None type as input for T.ptr and T.Tensor * lint * lint * lint * lint fix
-
Tong WU authored
* [Feature] Add vectorized float16 and float32 conversion support in CUDA codegen * Implemented handling for conversions between float16 and float32 types, specifically for vectorized operations using __half22float2 and __float22half2_rn. * Enhanced the existing code to support both directions of conversion based on the lane count. * Improved overall type handling in the VisitExpr_ method for better compatibility with TileLang. * [Feature] Add float32 to float8 conversion support in CUDA codegen * Implemented handling for conversion from float32 to float8 (E4M3/E5M2) in the VisitExpr_ method. * Added vectorized conversion support using __nv_cvt_float2_to_fp8x2 for float2 to fp8x2 transformations. * Enhanced type handling for better compatibility with TileLang, particularly for float8 types. * lint * fix a bug * [Enhancement] Support lanes=4 cases and add unit test for vectorized cast * lint * [Feature] Refactor bf16 convertion operations and remove legacy compile flags * lint
-
Lei Wang authored
* [Refactor] Improve scalar handling in CopyNode and update loop partition dtype logic * Refactored CopyNode::MakeSIMTLoop to handle scalar cases more efficiently by moving the scalar check to the end of the function. * Updated loop_partition.cc to set a default DataType for thread and vector extents, ensuring compatibility when loop_vars_ is empty. * lint fix * remove debug print
-
Yichen Yan authored
* update rules * ruff check * other fixes * fmt * do not touch examples * fmt
-
- 22 Oct, 2025 7 commits
-
-
Yu Cheng authored
-
Yu Cheng authored
-
Xuehai Pan authored
* [Maint] Remove pre-commit install in `format.sh` * [Maint] Update uncommitted change detection command * [Minor] update warning messages
-
Yu Cheng authored
-
Xuehai Pan authored
* [Lint] Retire `format.sh` and add `clang-tidy` to GHA workflow * chore: update clang-tidy settings * chore: upgrade clang-format and clang-tidy version * lint: resolve clang-tidy errors * [Maint] restore format.sh * [CI] pre-commit autoupdate * [Minor] fix `command -v` usage
-
Lei Wang authored
-
Lei Wang authored
* add alloc_reducer gemv example * test
-
- 21 Oct, 2025 5 commits
-
-
Zhengju Tang authored
* [Lint] * [BugFix] Freeze the memory order of all atomic_add operations * [Lint] * [Atomic] Move on to regional atomic add * [Lint]
-
Yu Cheng authored
-
Lei Wang authored
* - carry existing local-var initializer map into OpaqueBlockLower, reattach it to generated Allocates and the PrimFunc attrs - thread the map through FlattenBuffer and StorageRewrite so flattened/merged allocations keep their tl.local_var_init annotations - teach annotation handling to accept scalar initializers, resolve buffers, and merge with existing stat * lint fix * enhance * lint fix * lint fix -
Lei Wang authored
* • Enable configurable StorageRewrite inplace detection - Add kStorageRewriteDetectInplace constant and register the flag with PassContext so C++ code no longer hard-codes the key. - Wire StorageRewrite to include TileLang builtin constants and honor the new config toggle when deciding inplace reuse. - Document the flag across Python surfaces (PassConfigKey, JIT/autotuner docs) with usage guidance and simplified IR examples. * lint fix * add test * lint fix
-
Tong WU authored
* [Cleanup] Remove `tilelang.disable_cache()` calls from example scripts * lint * lint
-