1. 17 Oct, 2025 7 commits
  2. 16 Oct, 2025 4 commits
  3. 15 Oct, 2025 8 commits
    • Yu Cheng's avatar
    • Tong WU's avatar
      [BugFix] Phaseout dependency of Triton in sink examples to make CI happy (#1045) · 8f001e02
      Tong WU authored
      
      
      * [BugFix] Phaseout dependency of Triton in sink examples to make CI happy
      
      - Added `benchmark_gqa_sink_fwd.py` and `benchmark_mha_sink_fwd.py` to evaluate performance of GQA and MHA attention mechanisms using Triton.
      - Refactored existing attention sink implementations to remove Triton kernel definitions from the reference programs, streamlining the code.
      - Updated input generation and benchmarking logic to enhance configurability and performance measurement.
      - Improved overall structure and organization of the examples for better clarity and usability.
      
      * [Lint]: [pre-commit.ci] auto fixes [...]
      
      ---------
      Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
      8f001e02
    • Xuehai Pan's avatar
      [CI][Refactor] Merge test CI workflow files into one (#973) · 8ce27782
      Xuehai Pan authored
      
      
      * refactor: merge test CI workflow files into one
      
      * chore: set `UV_INDEX_STRATEGY=unsafe-best-match`
      
      * feat: add AST test with Python 3.8
      
      * feat: implement manual caching mechanism for self-hosted runners
      
      * refactor: simplify cache logic for self-hosted runners
      
      * chore: clear uv cache on failure
      
      * chore: print format.sh output to logs
      
      * chore: improve uv caching
      
      * chore: disable parallel test
      
      * chore: use `PYTHONDEVMODE=1` in CI
      
      * feat: enable coredump generation
      
      * fix: fix perfbench condition
      
      * Revert "feat: enable coredump generation"
      
      This reverts commit c52da65cb572932e09905d08c43a39ec3cf47c54.
      
      * chore: move example CI down
      
      * Revert "chore: move example CI down"
      
      This reverts commit 9d8e65055e01d955c5268a9a6705d270c2de0d57.
      
      * chore: skip example `test_example_mha_sink_bwd_bhsd`
      
      * chore: skip example `test_example_gqa_sink_bwd_bhsd`
      
      * fix: fix example argument passing
      
      * fix: loosen test criteria
      
      * chore: rename `CMAKE_CONFIGURE_OPTIONS` -> `CLANG_TIDY_CMAKE_OPTIONS` for clarity
      
      * feat: enable parallel testings
      
      * chore: update pytest options
      
      * remove skipped test as now been resolved
      
      * chore: empty commit to re-trigger ci
      
      * test for n 1
      
      * chore: remove ` --numprocesses=1` option in example
      
      * chore: disable failfast
      
      * chore: update cibw selection
      
      * fix: fix git submodule clone
      
      * chore: update cibw commands
      
      * fix: fix yapf multiprocessing
      
      * chore: setup ccache for CIBW on macOS only
      
      * chore: update comments
      
      * chore: update artifact listing
      
      * fix: do not fail if not found nvcc in PATH
      
      * fix: fix flash-attn installation
      
      * chore: update dist workflow trigger
      
      * chore: remove outdated comments
      
      * chore(workflows/dist): simplify build matrix strategy
      
      * fix: fix CUDA path finding
      
      * fix: fix CUDA path finding
      
      * chore: imcrease CI timeout
      
      * ci: disable failfast
      
      * fix: hide path prefix
      
      * chore: more verbose
      
      * chore: disable PR trigger for dist workflow
      
      * fix: seed for tests
      
      * fix: use nightly torch for ROCm tests
      
      * chore: enable PR trigger for dist workflow
      
      * chore: stop uploading debug wheels as artifacts in PR
      
      * chore: do not run workflows in forks
      
      * chore: housekeep requirements
      
      * chore: use Nightly-ROCm-6.3 for CI
      
      * chore: use Nightly-ROCm-6.4 for CI
      
      * Update ROCm toolkit version to 7.0
      
      * chore: restore previous rocm-ci.yml for test
      
      * fix: cleanup PYTHONPATH
      
      * chore: remove previous rocm-ci.yml
      
      * ci fix
      
      * chore: remove previous rocm-ci.yml
      
      * chore: enable parallel example run
      
      ---------
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      Co-authored-by: default avataralex_xiao <xinyuxiao2024@gmail.com>
      8ce27782
    • alex_xiao's avatar
      fix bug&add amd examples (#966) · 80665cd1
      alex_xiao authored
      * [Enhancement] Refactor buffer index handling for improved precision and clarity (#668)
      
      - Enhanced buffer index handling to address precision issues by removing redundant operations.
      - Streamlined the logic for determining buffer overlaps, ensuring more accurate conflict detection.
      - Updated related documentation to reflect changes in buffer management practices.
      
      * Remove obsolete test script for AMD example, streamlining the examples directory.
      
      * Remove unused dtype_size variable in AMD example script to streamline code.
      
      * Add input configuration file and update AMD example script for enhanced flexibility
      
      - Introduced a new input.txt file for configurable parameters.
      - Modified the example_amd_flash_attn_fwd.py script to allow for a wider range of configurations, including additional options for num_stages, enable_rasterization, and k_pack.
      - Streamlined the main function for better clarity and organization.
      - Added a new test script to facilitate running the example...
      80665cd1
    • Lei Wang's avatar
      [Language] Expose `T.get_warp_idx_sync` and `T.shuffle_elect` for efficient thread election (#989) · b78d8404
      Lei Wang authored
      
      
      * Expose CUDA warp/lane intrinsics in TileLang frontend
      
      * generalize warp indexing intrinsics and add coverage
      
      * [Lint]: [pre-commit.ci] auto fixes [...]
      
      ---------
      Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
      b78d8404
    • LJC00118's avatar
      [CUDA] Add pack functions for FP8 types (#967) · 32ddc1ac
      LJC00118 authored
      * Remove an incorrect check
      
      * add fp8 pack function
      
      * code lint
      
      * minor fix
      
      * minor fix
      
      * minor fix
      
      * Minor fix
      
      * Minor fix
      32ddc1ac
    • Lei Wang's avatar
      c67f73b0
    • Lei Wang's avatar
      [TIR] Revert some changes of Pass `LowerIntrin` (#1035) · e5399527
      Lei Wang authored
      
      
      * keep >> instead of /
      
      * re think replicate
      
      * lint fix
      
      * handle const int buffers
      
      * rep fix
      
      ---------
      Co-authored-by: default avatarZhiwen Mo <zm125@ic.ac.uk>
      e5399527
  4. 14 Oct, 2025 8 commits
  5. 13 Oct, 2025 4 commits
    • Cunxiao Ni's avatar
      [CI] Removes redundant environment variable (#1020) · eb37e459
      Cunxiao Ni authored
      * [CI] Removes redundant environment variable
      Removes the `UV_INDEX_URL`
      
      * triggle CI
      
      * triggle CI
      
      * triggle CI
      
      * triggle CI
      eb37e459
    • Yichen Yan's avatar
      [Build] Migrate to scikit-build-core (#939) · d89ba5b8
      Yichen Yan authored
      
      
      * cleanup
      
      * init
      
      * build first wheel that may not work
      
      * build cython ext
      
      * fix tvm build
      
      * use sabi
      
      * update rpath to support auditwheel
      
      * pass editible build
      
      * update ci
      
      * fix warnings
      
      * do not use ccache in self host runner
      
      * test local uv cache
      
      * test pip index
      
      * update lib search to respect new lib location
      
      * fix
      
      * update ci
      
      * enable cuda by default
      
      * update src map
      
      * fix
      
      * fix
      
      * fix
      
      * Generate version with backend and git information at build time
      
      * copy tvm_cython to wheels
      
      * fix tvm lib search
      
      * fmt
      
      * remove unused
      
      * auto detect ccache
      
      * add back backend-related files
      
      * remove jit cython adaptor to simplify code
      
      * fmt
      
      * fix ci
      
      * ci fix 2
      
      * ci fix 3
      
      * workaround metal
      
      * ci fix 4
      
      * fmt
      
      * fmt
      
      * Revert "ci fix 4"
      
      This reverts commit d1de8291c3e40927955f3ad3cf87a75c78813676.
      
      * tmp
      
      * fix metal
      
      * trivial cleanup
      
      * add detailed build-time version for cuda
      
      * add back mlc
      
      * Restore wheel info and other trivial updates
      
      * update
      
      * fix cuda
      
      * upd
      
      * fix metal ci
      
      * test for ga build
      
      * test for nvidia/cuda
      
      * test ubuntu 20
      
      * fix
      
      * fix
      
      * Do not use `uv build`
      
      * fix
      
      * fix
      
      * log toolchain version
      
      * merge wheel
      
      * update
      
      * debug
      
      * fix
      
      * update
      
      * skip rocm
      
      * update artifacts each
      
      * fix
      
      * fix
      
      * add mac
      
      * fix cache
      
      * fix cache
      
      * fix cache
      
      * reset and add comment
      
      * upd
      
      * fix git version
      
      * update deps
      
      * trivial update
      
      * use in-tree build dir and install to src to speedup editable build
      
      * Revert "use in-tree build dir and install to src to speedup editable build"
      
      This reverts commit 6ab87b05c5eed811210136b8dca4fc3677dd51f2.
      
      * add build-dir
      
      * update docs
      
      * remove old scrips
      
      * [1/n] cleanup scripts
      
      * [Lint]: [pre-commit.ci] auto fixes [...]
      
      * fix and update
      
      * wait for tvm fix
      
      * revert some tmp fix
      
      * fix
      
      * fix
      
      * spell
      
      * doc update
      
      * test cibuildwheel
      
      * fix and test macos on ci
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ga event
      
      * cleanup
      
      * bump tvm to support api3
      
      * test final version
      
      * add cron
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ccache for metal cibuildwheel
      
      * test newer macos
      
      * finish
      
      ---------
      Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      d89ba5b8
    • Lei Wang's avatar
    • Yuqi Dong's avatar
      [Bugfix] Fix atomicadd auto vectorize identify var error (#883) · 340bfc50
      Yuqi Dong authored
      * update
      
      * update
      
      * update
      
      * update
      340bfc50
  6. 12 Oct, 2025 3 commits
  7. 11 Oct, 2025 6 commits