1. 31 Oct, 2025 2 commits
    • Lei Wang's avatar
      [FFI] Rebase tvm to v0.22.0 to utilize tvm-ffi (#1108) · 10911e28
      Lei Wang authored
      
      
      * 3rdparty tvm bump
      
      * bump tvm into v0.22.0
      
      * lint fix
      
      * rebase tvm
      
      * Update submodule tvm to latest commit 3085bc4
      
      * Refactor: Update configuration retrieval in CopyNode and adjust test registration in tilelang
      
      * test fix
      
      * add requirement
      
      * atomic_fix
      
      * atomic_fix
      
      * phaseout py39
      
      * optimize
      
      * optimize
      
      * lint fix
      
      * do not clean cache
      
      * do not clean cache
      
      * [Minor] Minor update for Python versions and dependencies
      
      * [Lint] fix lint for py39
      
      * [Lint] fix lint for ROCm
      
      * [Build][CI] Sync CI changes from upstream/sdist
      
      * [Lint] fix lint for ROCm
      
      * [Build][CI] Update `repair-wheel-command`
      
      * [Minor] update abi3audit result format
      
      * [Lint] fix lint for ROCm
      
      * [BugFix] fix build
      
      * [Lint] fix lint for ROCm
      
      * [BugFix] set rpath for libtvm and libtvm_runtime
      
      * [Deps] pin apache-tvm-ffi version
      
      * [Build] set Python 3.9 Limited API for Cython target
      
      * [Build] set Python 3.9 Limited API for Cython target
      
      * [Deps] Restore Python 3.8 support
      
      * [Build] use `apache-tvm-ffi`'s `libtvm_ffi`
      
      * [BugFix] use `;` as delimiter for RPATH on macOS
      
      * [BugFix] use `--ignore-missing-dependencies` for `delocate-wheel`
      
      * [Build] support `sccache` if available
      
      * [Build] add CIBW import test
      
      * [Build][CI] enable ccache for CIBW on Linux
      
      * [BugFix] set rpath for libtvm and libtvm_runtime
      
      * Revert "[Build][CI] enable ccache for CIBW on Linux"
      
      This reverts commit cd9ab57bb5ddd2572c60bcbbebde81480a658fd3.
      
      * [CI] fix perfbench bot
      
      * [BugFix] use Python 3.9 to build wheel
      
      * [Minor] update perfbench bot envs
      
      * [BugFix] fix CIBW environment on Linux
      
      * [CI] skip import test on CentOS 7
      
      * [CI] use Python urllib to download file instead of Wget
      
      ---------
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@pku.edu.cn>
      10911e28
    • Lei Wang's avatar
      [Release] Bump version to v0.1.6.post2 (#1160) · c37621c5
      Lei Wang authored
      * [Release] Update README and VERSION for v0.1.6.post2 compatibility with Python 3.8
      
      * [Enhancement] Update packaging configuration and Docker scripts for multi-architecture support
      
      * Add allowlist for TVM, CUTLASS, and Composable Kernel items in pyproject.toml
      * Enhance docker_local_distribute.sh to support cross-architecture builds using docker buildx
      * Modify pypi.manylinux.Dockerfile to accept TARGETARCH argument for better architecture handling
      
      * [Enhancement] Improve Docker scripts and build process for multi-architecture support
      
      * Update .gitignore to include dist directories
      * Refactor docker_local_distribute.sh for better cross-architecture handling and error management
      * Enhance docker_pypi_distribute.sh to support multi-architecture builds with docker buildx
      * Modify pypi_distribution.sh to clean up additional directories
      * Update pypi.manylinux.Dockerfile for improved environment configuration and architecture handling
      
      * fix
      
      * Remove outdated classifier for Artificial Intelligence from pyproject.toml
      
      * Update pyproject.toml classifiers and modify Docker distribution scripts for clarity
      
      * Add new classifier for Artificial Intelligence in pyproject.toml
      * Rename output directories in docker_local_distribute.sh and docker_pypi_distribute.sh for better context
      c37621c5
  2. 29 Oct, 2025 6 commits
  3. 28 Oct, 2025 5 commits
  4. 27 Oct, 2025 9 commits
  5. 25 Oct, 2025 1 commit
  6. 24 Oct, 2025 1 commit
  7. 23 Oct, 2025 4 commits
    • Wenhao Xie's avatar
      [Feature] Support None type as input for `T.ptr` and `T.Tensor` (#1114) · 50e789dd
      Wenhao Xie authored
      * [Feature] Support None type as input for T.ptr and T.Tensor
      
      * lint
      
      * lint
      
      * lint
      
      * lint fix
      50e789dd
    • Tong WU's avatar
      [Feature] Enhance vectorized conversion support in CUDA codegen (#1095) · a148d62a
      Tong WU authored
      * [Feature] Add vectorized float16 and float32 conversion support in CUDA codegen
      
      * Implemented handling for conversions between float16 and float32 types, specifically for vectorized operations using __half22float2 and __float22half2_rn.
      * Enhanced the existing code to support both directions of conversion based on the lane count.
      * Improved overall type handling in the VisitExpr_ method for better compatibility with TileLang.
      
      * [Feature] Add float32 to float8 conversion support in CUDA codegen
      
      * Implemented handling for conversion from float32 to float8 (E4M3/E5M2) in the VisitExpr_ method.
      * Added vectorized conversion support using __nv_cvt_float2_to_fp8x2 for float2 to fp8x2 transformations.
      * Enhanced type handling for better compatibility with TileLang, particularly for float8 types.
      
      * lint
      
      * fix a bug
      
      * [Enhancement] Support lanes=4 cases and add unit test for vectorized cast
      
      * lint
      
      * [Feature] Refactor bf16 convertion operations and remove legacy compile flags
      
      * lint
      a148d62a
    • Lei Wang's avatar
      [Refactor] Improve scalar handling in CopyNode and update loop partition dtype logi (#1111) · 86c8bb46
      Lei Wang authored
      * [Refactor] Improve scalar handling in CopyNode and update loop partition dtype logic
      
      * Refactored CopyNode::MakeSIMTLoop to handle scalar cases more efficiently by moving the scalar check to the end of the function.
      * Updated loop_partition.cc to set a default DataType for thread and vector extents, ensuring compatibility when loop_vars_ is empty.
      
      * lint fix
      
      * remove debug print
      86c8bb46
    • Yichen Yan's avatar
      [Lint] Enable pyupgrade linter in ruff (#963) · f14fb111
      Yichen Yan authored
      * update rules
      
      * ruff check
      
      * other fixes
      
      * fmt
      
      * do not touch examples
      
      * fmt
      f14fb111
  8. 22 Oct, 2025 7 commits
  9. 21 Oct, 2025 5 commits