1. 26 Nov, 2025 4 commits
    • Yunqian Fan's avatar
      [Enhancement] add more dtype and fix mma.ws for fp16 for tcgen05 (#1327) · f0c721a4
      Yunqian Fan authored
      * feat: add fp8 variants; add placeholder for fp6/fp4 in meta
      
      support ld with pack for fp32 dtype
      
      add dump
      
      add tempalte expand
      
      remove unused dtype and change to rebased apis
      
      * fix: when atom-m!=128, enable_ws
      
      * fix: typo in tcgen05 meta; dispatch in gemm sm100
      f0c721a4
    • Lei Wang's avatar
      [Refactor] Phaseout vmap for Tile Operators (#1334) · f5d9da46
      Lei Wang authored
      
      
      * Refactor GEMM and Reduce operations by moving NormalizeToBufferRegion and MakeAccessPtrFromRegion to utils.{h,cc} for better code organization and reuse.
      
      * lint fix
      
      * Refactor region handling by removing the RegionOp and updating NormalizeToBufferRegion to only accept BufferLoad and BufferRegion. This change improves code organization and simplifies the handling of memory regions across various operations.
      
      * fix
      
      * Refactor memory region handling by introducing `tl.region` calls across various operations, including GEMM and fill functions. This change enhances the consistency of region management and improves code organization by utilizing utility functions for buffer region conversions.
      
      * fix
      
      * fix
      
      * test fix
      
      * lint fix
      
      * Refactor GEMM operations to improve memory region handling by replacing `mbarPtr_` with `mbarRegion_` and updating related logic in both C++ and Python implementations. This change enhances the clarity and consistency of buffer region management.
      
      * fix
      
      * lint fix
      
      * fix
      
      * fix
      
      * test fix
      
      * lint fix
      
      * lint fix
      
      * minor fix
      
      * fix
      
      ---------
      Co-authored-by: default avatarZhiwen Mo <zm125@ic.ac.uk>
      f5d9da46
    • ConvolutedDog's avatar
      [Feat] Extend LegalizeNegativeIndex to support buffer store stmts (#1339) · fac04006
      ConvolutedDog authored
      This commit enhances the LegalizeNegativeIndex transformation pass to handle
      both buffer load and store operations with negative indices and adds some
      test cases.
      fac04006
    • LJC00118's avatar
      Add unit tests for T.assume (#1341) · f810f976
      LJC00118 authored
      
      
      * Add test for T.assume
      
      * Add unit test for T.assume
      
      * Add unit test for T.assume
      
      * Add unit tests for T.assume
      
      * Remove debug print for kernel source
      
      Remove print statement for kernel source in tests.
      
      * Update test_tilelang_language_assume.py
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      f810f976
  2. 25 Nov, 2025 5 commits
  3. 24 Nov, 2025 8 commits
  4. 23 Nov, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Backup Analyzer to get the appropriate arith informations (#1311) · 9f7bac4c
      Lei Wang authored
      * [Refactor] Update Vectorization Functions to Accept Analyzer Parameter
      
      - Modified `VectorizeLoop` and related functions to accept an `arith::Analyzer` parameter, enhancing their capability to perform analysis during vectorization.
      - Updated multiple instances in `copy.cc`, `fill.cc`, `parallel.cc`, and layout inference files to utilize the new analyzer parameter for improved performance and correctness.
      - Ensured consistency across vectorization logic by integrating the analyzer into existing workflows, facilitating better optimization opportunities.
      
      * [Fix] Corrected PostOrderVisit call in loop_vectorize.cc
      
      - Updated the PostOrderVisit function to analyze the body of the loop node instead of the node itself, ensuring proper handling of nested loops during vectorization analysis.
      
      * fix
      
      * lint fix
      
      * fix
      9f7bac4c
  5. 22 Nov, 2025 2 commits
  6. 21 Nov, 2025 4 commits
  7. 20 Nov, 2025 4 commits
  8. 19 Nov, 2025 5 commits
  9. 18 Nov, 2025 7 commits
    • Lei Wang's avatar
      [FFI] Use tvm ffi as the default execution backend (#1259) · 74da3696
      Lei Wang authored
      * [Refactor] Update FFI type handling and simplify argument management
      
      * Refactored FFI type definitions in runtime and code generation files to use `TVMFFIAny` instead of `TVMValue`, enhancing type clarity.
      * Updated function registration in `runtime.cc` to utilize canonical names for better consistency.
      * Simplified argument handling in the `simplify` transformation, ensuring unused buffer parameters are removed only when simplification is enabled.
      * Adjusted autotuner and profiler parameters to standardize the execution backend to `tvm_ffi`, improving clarity in backend selection.
      * Removed obsolete `adapt_torch2tvm` function from tensor utilities to streamline the codebase and reduce complexity.
      
      * [Update] Sync TVM submodule and enhance kernel source handling
      
      * Updated the TVM submodule to commit cdc2aced, ensuring compatibility with recent changes.
      * Added functionality to print kernel source in `example_blocksparse_gemm.py` for better debugging.
      * Commented out the main execution call in test files to prevent unintended execution during testing.
      * Introduced `tilelang.disable_cache()` in various test files to streamline testing and avoid cache-related issues.
      * Refactored kernel source retrieval methods to improve clarity and consistency across different execution backends.
      
      * [Refactor] Clean up imports and improve code formatting
      
      * Removed unused import of `tilelang.testing` in `test_example_blocksparse_gemm.py` to streamline the code.
      * Reformatted several lines in `arg_binder.cc`, `make_packed_api.cc`, `tvm_ffi.py`, and `adapter.py` for improved readability and consistency.
      * Updated comments and spacing in `tvm_ffi.py` to enhance clarity without altering functionality.
      
      * Update execution backend options and improve resolution logic
      
      - Changed default execution backend from "cython" to "auto" in multiple locations to allow automatic selection based on the target.
      - Expanded the list of supported execution backends to include "torch" and "nvrtc" across various classes and functions.
      - Enhanced backend resolution logic in `KernelCache` and `AutoTuner` to ensure appropriate backend selection based on the target.
      - Updated documentation to reflect changes in execution backend options and their defaults.
      
      * lint fix
      
      * fix
      
      * Enhance argument handling in CUDA and HIP runtime modules
      
      - Updated `ExtractFuncInfo` in `rt_mod_cuda.cc` and `rt_mod_hip.cc` to map boolean argument types to int32, ensuring compatibility with device runtime.
      - Refactored `BindDLTensor` in `arg_binder.cc` to improve null handling and validation checks for DLTensor parameters, utilizing expression-level guards to prevent dereferencing null pointers.
      - Enhanced error checking for buffer shape, strides, and data fields, ensuring robust handling of optional inputs and maintaining consistency across various checks.
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * minor fix
      
      * fix
      
      * recover check
      
      * Refactor argument binding and validation in `arg_binder.cc`
      
      - Improved null handling and validation checks in `BindDLTensor`, ensuring safe dereferencing of pointers.
      - Enhanced consistency checks for buffer shape, strides, and data fields, utilizing expression-level guards.
      - Updated `MakePackedAPI` to maintain code clarity and consistency in argument handling.
      - Minor adjustments in test files to streamline kernel execution and improve readability.
      
      * lint fix
      
      * stride fix
      
      * minor fix
      
      * fix
      
      * lint fix
      
      * lint fix
      
      * Add CUDA stream access policy window helpers and integrate with L2 persistent cache management
      
      - Introduced functions to set and reset the CUDA stream access policy window, allowing for better control over L2 cache usage.
      - Updated runtime files to include new FFI packed functions for managing stream attributes.
      - Modified lower_hopper_intrin to incorporate prologue and epilogue statements for L2 cache setup and teardown.
      - Enhanced tests to verify the inclusion of new FFI calls in the generated kernel source.
      
      * check with symbolic
      
      * support null ptr
      
      * Update CMakeLists and lower.py for code generation and subproject status
      
      - Added `codegen_c_host.cc` to the list of source files in CMakeLists.txt for improved code generation support.
      - Updated the function call in `lower.py` to use `target.build.tilelang_c` for C target host code generation, enhancing compatibility.
      - Marked the TVM subproject as dirty to indicate local modifications.
      
      * lint fix
      
      * Update comments for clarity in quickstart.py
      74da3696
    • Chaofan Lin's avatar
      [Language] Add shape check in `T.view/reshape` (#1277) · 921b96a3
      Chaofan Lin authored
      * [Language] Add shape check in T.view/reshape
      
      * address comments
      921b96a3
    • Lei Wang's avatar
      [Bugfix] Minor fix for some cases (#1278) · 1b0efb65
      Lei Wang authored
      1b0efb65
    • Jay Zhuang's avatar
      Bug fix for Gated Delta Net benchmark script (#1267) · 0f980f15
      Jay Zhuang authored
      
      
      * fix argument order for fla chunk_gated_delta_rule_fwd_h
      
      * explicit import assert_similar from utils
      
      * rename utils module to avoid name clash
      
      * set store_final_state and save_new_value to True
      
      * fix
      
      ---------
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      0f980f15
    • Elevator14B's avatar
      Fix various issues under `int64_t` static and dynamic shape. (#1218) · 49c85715
      Elevator14B authored
      
      
      * Fix various issues under int64_t static and dynamic shape.
      
      * Resolve reviewed issues.
      
      * Add unit test.
      
      * fix
      
      ---------
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      49c85715
    • Chaofan Lin's avatar
      [BugFix] Adding extra parameters into autotune hashkey (#1274) · e805f8e5
      Chaofan Lin authored
      * [BugFix] Adding extra parameters into autotune hashkey
      
      * lint
      
      * None check
      
      * check serializable
      e805f8e5
    • Yichen Yan's avatar