1. 06 Dec, 2025 7 commits
    • Yunqian Fan's avatar
      [Fix] typo in cuda attr (#1380) · 8f50c122
      Yunqian Fan authored
      * [Bugfix] make cuda driver api compat with cuda12/13, along with tests
      
      * fix typo in cudaDevAttr
      8f50c122
    • Yunqian Fan's avatar
    • Lei Wang's avatar
      [Builder] Enhance variable name binding and scope management (#1378) · 3f8e6b59
      Lei Wang authored
      - Improved handling of TVM Var/Buffer names to prevent out-of-scope errors when reusing Python names across different for-frames.
      - Added assertions to ensure variables are defined within the correct control flow frame, enhancing error checking and code reliability.
      3f8e6b59
    • Kuris's avatar
      [Language] Tilelang LazyJIT Experimental Version (#1337) · 0921328d
      Kuris authored
      
      
      * initial step
      
      * modify builder
      
      * scratch version of new frontend
      
      * write some tests
      
      * add many tests
      
      * add typing stub for tir.ir
      
      * remove idents
      
      * minor update
      
      * minor update
      
      * First version of jitv2 (renamed to LazyJIT)
      
      * fix pre-commit error
      
      * minor fix
      
      * fix lint error
      
      * fix lint error
      
      * Fix conditional check for PrimFunc instance
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      0921328d
    • Yichen Yan's avatar
    • Cunxiao Ni's avatar
      [Tool] Provide layout visualization tool (#1353) · 924225ed
      Cunxiao Ni authored
      * Provide layout visualization tool
      
      Adds a layout visualization tool to TileLang, which helps users understand and debug the layout transformations applied during compilation.
      
      This tool visualizes the memory layout of tensors at different stages of the compilation process, allowing developers to identify potential inefficiencies and optimize their code for better performance.
      
      The visualization can be enabled via a pass config option.
      
      * format
      
      * add layout visual example
      
      * Adds vis extra with matplotlib dependency
      
      * rafactor pass config name
      
      * fix lint
      
      * Enables configurable layout visualization formats
      
      Allows users to specify the output formats (png, pdf, svg) for layout visualization through a pass config option.
      
      This change provides more flexibility in how layout visualizations are generated, allowing users to choose the formats that best suit their needs.
      
      It also fixes a bug where layout visualization was not correctly disabled when the config option was set to "false".
      
      * Adds visual layout inference tool docs
      
      * fix lint
      
      * fix lint
      
      * Rafactor configurable layout visualization formats
      
      * fix lint
      
      * fix typo
      
      * add some comments
      
      * fix lints
      
      * add some warnings for user
      
      * Moves layout visualization
      
      * Refactors layout visualization pass configuration
      
      Updates the layout visualization pass configuration to use boolean flag for enabling and a string for specifying formats.
      
      * Enables multiple layout visualization formats
      
      * Updates layout visualization docs
      
      * Moves layout visualization to analysis
      924225ed
    • Lei Wang's avatar
      [Enhancement] Introduce buffer var lca analysis for pass plan buffer allocations (#1376) · f8e7fef5
      Lei Wang authored
      * Update submodule TVM to latest commit and add PlanAndUpdateBufferAllocationLocation function to transform module
      
      - Updated the TVM submodule to commit 3a32b763.
      - Added a new function `PlanAndUpdateBufferAllocationLocation` in the transform module to facilitate buffer allocation planning within PrimFuncs.
      
      * Refactor buffer allocation code for improved readability and consistency
      
      - Updated formatting and spacing in `plan_update_buffer_allocation_location.cc` for better code clarity.
      - Standardized the use of pointer and reference syntax across various class methods.
      - Enhanced comments for better understanding of buffer allocation logic.
      - Removed unnecessary lines and improved overall code structure.
      
      * Refactor buffer allocation checks for improved clarity
      
      - Replaced size checks with empty checks for `ffi::Array<Buffer>` in `plan_update_buffer_allocation_location.cc` to enhance code readability.
      - Updated conditions in multiple methods to use `empty()` instead of comparing size to zero, streamlining the logic.
      f8e7fef5
  2. 05 Dec, 2025 1 commit
    • Lei Wang's avatar
      [Layout] Enhance Free Layout Inference (#1375) · 6654064d
      Lei Wang authored
      * [Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py
      
      * [Enhancement] Extend support for float8 data types in GEMM operations
      
      - Updated GEMM operations to recognize additional float8 data types: `float8_e4m3fn` and `float8_e5m2fnuz`.
      - Refactored condition checks in `checkWgmma` methods to simplify float8 type handling.
      - Adjusted test cases to ensure compatibility with the new float8 types in tile language examples.
      
      * lint fix
      
      * [Enhancement] Add injective layout detection and exception handling
      
      - Introduced `DetectInjective` method in `FragmentNode` to check for injective layouts.
      - Added `LoopLayoutInjectiveException` to handle errors related to non-injective layouts.
      - Updated `InferLayout` methods in `ParallelOpNode` to utilize injective checks and log relevant information.
      - Refactored layout inference queue management to use `std::deque` for improved performance and added prioritization logic for buffer layouts.
      
      * remove debug print
      
      * remove debug print
      
      * remove debug print
      
      * minor layout fix
      
      * fix for T.view
      
      * [Enhancement] Improve injective layout detection in FragmentNode
      
      - Updated the `DetectInjective` method to handle symbolic dimensions more effectively by introducing a mechanism to collect symbolic shapes and adjust the detection level accordingly.
      - Added logging for cases where the layout detection falls back to NoCheck due to symbolic dimensions.
      - Minor update to the test file to include the tilelang testing module.
      
      * [Refactor] Simplify layout inference for bulk copy operations
      
      - Removed unnecessary conditions for bulk load/store operations in the layout inference logic.
      - Streamlined the handling of layout application for bulk copy instances to enhance clarity and maintainability.
      
      * remove debug print
      
      * [Enhancement] Introduce layout-related exceptions and improve error handling
      
      - Added `LayoutConflictException` and `LoopLayoutInjectiveException` classes for better exception management in layout operations.
      - Updated `InferLayout` method in `ParallelOpNode` to throw `LoopLayoutInjectiveException` with detailed error information when injective layout checks fail.
      - Removed redundant exception class definitions from `parallel.h` to streamline code organization.
      6654064d
  3. 03 Dec, 2025 2 commits
    • Lei Wang's avatar
      [Refactor] Generalize fp8 process (#1372) · 92121fc6
      Lei Wang authored
      * [Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py
      
      * [Enhancement] Extend support for float8 data types in GEMM operations
      
      - Updated GEMM operations to recognize additional float8 data types: `float8_e4m3fn` and `float8_e5m2fnuz`.
      - Refactored condition checks in `checkWgmma` methods to simplify float8 type handling.
      - Adjusted test cases to ensure compatibility with the new float8 types in tile language examples.
      
      * lint fix
      92121fc6
    • Yuqi Dong's avatar
  4. 02 Dec, 2025 5 commits
  5. 01 Dec, 2025 5 commits
    • Lei Wang's avatar
      [Enhancement] Implement dynamic unroll factor in CUDA code generation (#1360) · 388ee7ee
      Lei Wang authored
      * [Enhancement] Implement dynamic unroll factor in CUDA code generation
      
      This commit introduces support for specifying a dynamic unroll factor in the CUDA code generation. The `unroll_factor` map is added to store unroll factors for loop variables, allowing for more flexible and optimized loop unrolling. Additionally, the `unroll` function is integrated into the loop language, enabling users to define unroll factors directly in their code. This enhancement improves performance by allowing tailored unrolling strategies based on specific loop characteristics.
      
      * lint fix
      
      * [Bugfix] Correct initialization of non-zero counters in custom compress kernel and update TIR registration for gemm_sp_py to use the correct tile operation
      388ee7ee
    • Lei Wang's avatar
    • botbw's avatar
      [Language] support `T.gemm_sp_v2` on sm80 and sm89 (#1056) · 283a9a00
      botbw authored
      * [misc] add a cpp side wrapper for gemm_sp_py
      
      * [misc] typing
      
      * [IR] bind GemmSPWarpPolicy
      
      * [chore] add wrapper code
      
      * [IR] fix GemmSPWarpPolicy
      
      * [codegen] apply ptxas instructions
      
      * [intrinsic] add typical (unused) mma layout
      
      * [template] add uint16 debug func
      
      * [intrinsic] add b matrix layout
      
      * [gemm_sp] enable fp16/bf16 on sm8x
      
      * [layout] refactor fp16/bf16 layout
      
      * [gemm_sp] enable int8
      
      * [chore] update test case dtype
      
      * [gemm_sp] enable fp32
      
      * [layout] refactor layouts
      
      * [intrinsic] enable ldmatrix for mat A
      
      * [layout] enable ldsm for matrix b
      
      * [layout] add ldmatrix for fp32 and fp8
      
      * [chore] refine
      
      * [chore] refactor
      
      * [chore] add fp8 efactor
      
      * [chore] refactor
      
      * [chore] add remove negative zero util
      
      * [example] add a custom compress kernel
      
      * [chore] minor update
      
      * [test] refactor gemm_sp test
      
      * [refactor] make metadata layout func
      
      * [example] add option for using cutlass layout
      
      * [doc] add a gemm_sp doc
      
      * [doc] minor polish
      
      * [chore] remove unused
      
      * [bugfix] fix non replicate b case
      
      * [test] refactor
      
      * [chore] add a check
      
      * [bugfix] fix util bug
      
      * [wip] init a new test case for v2
      
      * [chore] minor refactor
      
      * [chore] minor update
      
      * [bugfix] enable 16bit rs
      
      * [language] enable rs
      
      * [language] enable gemm_sp_sr
      
      * [language] enable gemm_sp_rr
      
      * [test] enable more tests
      
      * [tvm] update ffi binding
      
      * [chore] remove print
      
      * [chore] fix benchmark script
      
      * [lint] precommit lint
      
      * [chore] apply feedback
      
      * [test] use arch 8.0
      
      * [chore] rollback ::ordered_metadata for backward compatibility
      
      * [bugfix] fix captialized
      
      * [example] keep gemm_sp on hopper
      
      * [test] fix no fp8 normal kernel
      
      * [test] reduce matmul size to satisfy accum error
      
      * [test] use cal_diff for assertion
      
      * [bugfix] expand float8 type
      
      * [lib] add make_int4 for short type
      
      * [language] add transpose E
      
      * [bugfix] fix wrong var
      
      * [format] format
      
      * [chore] refactor binding
      
      * [chore] fix wrong passing var
      283a9a00
    • Chaofan Lin's avatar
      [Analysis] Enhance NestedLoopChecker with tile op cases (#1358) · b10ef75f
      Chaofan Lin authored
      * [Analysis] Enhance NestedLoopChecker with tile op cases
      
      * fix tileop issue
      b10ef75f
    • Lei Wang's avatar
      [Refactor] Update Fragment Indexing in ParallelOpNode's InferLayout Method (#1359) · 1b42c87b
      Lei Wang authored
      This commit refines the Fragment creation process in the InferLayout method of ParallelOpNode. It removes the unnecessary forward_index array and utilizes default fragment indexing for consistency with other operations. Additionally, it binds the thread range to enhance comparability across different operations.
      1b42c87b
  6. 30 Nov, 2025 1 commit
  7. 28 Nov, 2025 3 commits
    • LJC00118's avatar
      [Bugfix] Disable floordiv optimization due to integer overflow risk (#1355) · a4ea7da9
      LJC00118 authored
      * disable overflow-prone floordiv optimization in lower_intrin.cc
      
      * disable overflow-prone floordiv optimization in lower_intrin.cc
      a4ea7da9
    • Lei Wang's avatar
      [Enhancement] Improve error handling and assertion messages across runtime and... · 17cfeb76
      Lei Wang authored
      [Enhancement] Improve error handling and assertion messages across runtime and argument binding (#1356)
      
      This commit enhances the error handling mechanisms in the runtime by introducing CPU-safe runtime helpers and refining assertion messages in the CodeGenCHost and ArgBinder. It includes structured packed error messages for various conditions, improving clarity in diagnostics. Additionally, the CMake configuration is updated to always include necessary runtime helpers, ensuring consistent error reporting. The changes aim to provide clearer feedback during runtime errors and improve the overall robustness of the argument binding process.
      17cfeb76
    • Lei Wang's avatar
      [Refactor] Simplify index sign state handling in LegalizeNegativeIndex (#1354) · 36a2b2f3
      Lei Wang authored
      This commit refines the logic for determining the sign state of indices in the LegalizeNegativeIndex transformation. It prioritizes vector patterns, specifically Ramp and Broadcast nodes, to avoid compile-time lane queries. The handling of scalar indices is also streamlined, ensuring clearer diagnostics when non-negativity cannot be proven. These changes enhance the robustness and clarity of index handling in the transformation pass.
      36a2b2f3
  8. 27 Nov, 2025 2 commits
    • Lei Wang's avatar
      [Refactor] Improve assertion handling in CodeGenCHost and ArgBinder (#1352) · 1e92d11c
      Lei Wang authored
      * [Refactor] Improve assertion handling in CodeGenCHost and ArgBinder
      
      This commit refines the assertion message generation in CodeGenCHost by optimizing the handling of equality checks and reducing buffer size for error messages. Additionally, it enhances the ArgBinder by introducing a nullable guard mechanism for assertions, allowing for more precise error handling when binding arguments. The changes improve the clarity and efficiency of assertion handling across the codebase.
      
      * [Enhancement] Update matmul kernel and optimize argument binding
      
      This commit enhances the matmul kernel by introducing additional tensor parameters and refining the pipeline stages for improved performance. It also updates the argument binding mechanism to include a flag indicating whether buffers are used, enhancing the efficiency of buffer management. Furthermore, the optimization phase in the engine is improved by adding a simplification step, ensuring better performance and clarity in the generated code.
      
      * lint fix
      
      * [Enhancement] Add tensor checks documentation and improve argument binding assertions
      
      This commit introduces a new documentation page for host-side tensor checks, detailing the automatic validations performed by TileLang on kernel arguments. It enhances the ArgBinder by adding assertions for non-null pointers when arguments are used, improving error handling. Additionally, the optimization phase in the engine is updated to include a simplification step, ensuring better performance and clarity in the generated code.
      
      * [Enhancement] Update .gitignore and refine matmul kernel for improved performance
      
      This commit adds host checks logs to the .gitignore file to prevent unnecessary log files from being tracked. Additionally, it refines the matmul kernel by adjusting pipeline stages, updating tensor parameters, and enhancing argument handling for better performance. The changes also include improved error messages in the argument binding process, ensuring clearer diagnostics for users.
      
      * lint fix
      
      * lint fix
      
      * [Refactor] Simplify tensor_null_test function and remove ptr_null_test
      
      This commit refactors the tensor_null_test function by adding a with_bias parameter and removing the ptr_null_test function, which was previously unused. The run_test function is updated to reflect these changes, streamlining the testing process for tensor operations.
      
      * lint fix
      
      * fix
      1e92d11c
    • Yuxuan Hu's avatar
      Add sparse fine-tuning kernel for deepseek sparse attention to example (#1296) · b8240b7a
      Yuxuan Hu authored
      * [EXAMPLE] add example for dsa sparse finetuning
      
      * [Refactor]
      b8240b7a
  9. 26 Nov, 2025 7 commits
    • Gongen-Ali's avatar
      [Enhancement] Add support for k_pack in gemm_mfma (#1344) · 6bae64f6
      Gongen-Ali authored
      * add support for k_pack
      
      * support benchmark on ROCm
      
      * fix format
      6bae64f6
    • Kuris's avatar
      4f844000
    • Lei Wang's avatar
      [Refactor] Enhance CopyNode's IterVar Creation and Range Handling (#1346) · 17718bec
      Lei Wang authored
      * [Refactor] Enhance CopyNode's IterVar Creation and Range Handling
      
      This commit refines the `MakeIterVars` method in `CopyNode` to select base ranges based on memory scope levels, ensuring that the chosen ranges are not smaller than the original source ranges. Additionally, it updates the Python `copy` function to clarify range handling, including broadcasting logic and extent alignment. These changes improve the robustness and clarity of the copy operation's implementation.
      
      * test fix
      17718bec
    • Yunqian Fan's avatar
      [Enhancement] add more dtype and fix mma.ws for fp16 for tcgen05 (#1327) · f0c721a4
      Yunqian Fan authored
      * feat: add fp8 variants; add placeholder for fp6/fp4 in meta
      
      support ld with pack for fp32 dtype
      
      add dump
      
      add tempalte expand
      
      remove unused dtype and change to rebased apis
      
      * fix: when atom-m!=128, enable_ws
      
      * fix: typo in tcgen05 meta; dispatch in gemm sm100
      f0c721a4
    • Lei Wang's avatar
      [Refactor] Phaseout vmap for Tile Operators (#1334) · f5d9da46
      Lei Wang authored
      
      
      * Refactor GEMM and Reduce operations by moving NormalizeToBufferRegion and MakeAccessPtrFromRegion to utils.{h,cc} for better code organization and reuse.
      
      * lint fix
      
      * Refactor region handling by removing the RegionOp and updating NormalizeToBufferRegion to only accept BufferLoad and BufferRegion. This change improves code organization and simplifies the handling of memory regions across various operations.
      
      * fix
      
      * Refactor memory region handling by introducing `tl.region` calls across various operations, including GEMM and fill functions. This change enhances the consistency of region management and improves code organization by utilizing utility functions for buffer region conversions.
      
      * fix
      
      * fix
      
      * test fix
      
      * lint fix
      
      * Refactor GEMM operations to improve memory region handling by replacing `mbarPtr_` with `mbarRegion_` and updating related logic in both C++ and Python implementations. This change enhances the clarity and consistency of buffer region management.
      
      * fix
      
      * lint fix
      
      * fix
      
      * fix
      
      * test fix
      
      * lint fix
      
      * lint fix
      
      * minor fix
      
      * fix
      
      ---------
      Co-authored-by: default avatarZhiwen Mo <zm125@ic.ac.uk>
      f5d9da46
    • ConvolutedDog's avatar
      [Feat] Extend LegalizeNegativeIndex to support buffer store stmts (#1339) · fac04006
      ConvolutedDog authored
      This commit enhances the LegalizeNegativeIndex transformation pass to handle
      both buffer load and store operations with negative indices and adds some
      test cases.
      fac04006
    • LJC00118's avatar
      Add unit tests for T.assume (#1341) · f810f976
      LJC00118 authored
      
      
      * Add test for T.assume
      
      * Add unit test for T.assume
      
      * Add unit test for T.assume
      
      * Add unit tests for T.assume
      
      * Remove debug print for kernel source
      
      Remove print statement for kernel source in tests.
      
      * Update test_tilelang_language_assume.py
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      f810f976
  10. 25 Nov, 2025 5 commits
  11. 24 Nov, 2025 2 commits