1. 18 Nov, 2025 1 commit
    • Lei Wang's avatar
      [FFI] Use tvm ffi as the default execution backend (#1259) · 74da3696
      Lei Wang authored
      * [Refactor] Update FFI type handling and simplify argument management
      
      * Refactored FFI type definitions in runtime and code generation files to use `TVMFFIAny` instead of `TVMValue`, enhancing type clarity.
      * Updated function registration in `runtime.cc` to utilize canonical names for better consistency.
      * Simplified argument handling in the `simplify` transformation, ensuring unused buffer parameters are removed only when simplification is enabled.
      * Adjusted autotuner and profiler parameters to standardize the execution backend to `tvm_ffi`, improving clarity in backend selection.
      * Removed obsolete `adapt_torch2tvm` function from tensor utilities to streamline the codebase and reduce complexity.
      
      * [Update] Sync TVM submodule and enhance kernel source handling
      
      * Updated the TVM submodule to commit cdc2aced, ensuring compatibility with recent changes.
      * Added functionality to print kernel source in `example_blocksparse_gemm.py` for better debugging.
      * Commented out the main execution call in test files to prevent unintended execution during testing.
      * Introduced `tilelang.disable_cache()` in various test files to streamline testing and avoid cache-related issues.
      * Refactored kernel source retrieval methods to improve clarity and consistency across different execution backends.
      
      * [Refactor] Clean up imports and improve code formatting
      
      * Removed unused import of `tilelang.testing` in `test_example_blocksparse_gemm.py` to streamline the code.
      * Reformatted several lines in `arg_binder.cc`, `make_packed_api.cc`, `tvm_ffi.py`, and `adapter.py` for improved readability and consistency.
      * Updated comments and spacing in `tvm_ffi.py` to enhance clarity without altering functionality.
      
      * Update execution backend options and improve resolution logic
      
      - Changed default execution backend from "cython" to "auto" in multiple locations to allow automatic selection based on the target.
      - Expanded the list of supported execution backends to include "torch" and "nvrtc" across various classes and functions.
      - Enhanced backend resolution logic in `KernelCache` and `AutoTuner` to ensure appropriate backend selection based on the target.
      - Updated documentation to reflect changes in execution backend options and their defaults.
      
      * lint fix
      
      * fix
      
      * Enhance argument handling in CUDA and HIP runtime modules
      
      - Updated `ExtractFuncInfo` in `rt_mod_cuda.cc` and `rt_mod_hip.cc` to map boolean argument types to int32, ensuring compatibility with device runtime.
      - Refactored `BindDLTensor` in `arg_binder.cc` to improve null handling and validation checks for DLTensor parameters, utilizing expression-level guards to prevent dereferencing null pointers.
      - Enhanced error checking for buffer shape, strides, and data fields, ensuring robust handling of optional inputs and maintaining consistency across various checks.
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * minor fix
      
      * fix
      
      * recover check
      
      * Refactor argument binding and validation in `arg_binder.cc`
      
      - Improved null handling and validation checks in `BindDLTensor`, ensuring safe dereferencing of pointers.
      - Enhanced consistency checks for buffer shape, strides, and data fields, utilizing expression-level guards.
      - Updated `MakePackedAPI` to maintain code clarity and consistency in argument handling.
      - Minor adjustments in test files to streamline kernel execution and improve readability.
      
      * lint fix
      
      * stride fix
      
      * minor fix
      
      * fix
      
      * lint fix
      
      * lint fix
      
      * Add CUDA stream access policy window helpers and integrate with L2 persistent cache management
      
      - Introduced functions to set and reset the CUDA stream access policy window, allowing for better control over L2 cache usage.
      - Updated runtime files to include new FFI packed functions for managing stream attributes.
      - Modified lower_hopper_intrin to incorporate prologue and epilogue statements for L2 cache setup and teardown.
      - Enhanced tests to verify the inclusion of new FFI calls in the generated kernel source.
      
      * check with symbolic
      
      * support null ptr
      
      * Update CMakeLists and lower.py for code generation and subproject status
      
      - Added `codegen_c_host.cc` to the list of source files in CMakeLists.txt for improved code generation support.
      - Updated the function call in `lower.py` to use `target.build.tilelang_c` for C target host code generation, enhancing compatibility.
      - Marked the TVM subproject as dirty to indicate local modifications.
      
      * lint fix
      
      * Update comments for clarity in quickstart.py
      74da3696
  2. 02 Oct, 2025 1 commit
    • Lei Wang's avatar
      [Layout] Strict annotate completed replicated layout for fragment with constant index (#929) · fc4bd452
      Lei Wang authored
      * [Layout] Add IsCompletedReplicated method and enhance layout inference in ParallelOpNode
      
      - Introduced IsCompletedReplicated method in FragmentNode to check if a buffer is fully replicated.
      - Enhanced InferLayout in ParallelOpNode to handle layout inference for replicated buffers, ensuring only fragment[0] access is allowed.
      - Updated error handling for non-zero index access in fragment buffers to improve robustness.
      
      * [Layout] Improve code formatting and readability in layout.cc and parallel.cc
      
      - Enhanced formatting in FragmentNode's IsCompletedReplicated method for better clarity.
      - Updated InferLayout method in ParallelOpNode to improve code readability by adjusting line breaks and indentation.
      - Ensured consistent formatting across conditional statements and comments for improved maintainability.
      
      * updt
      
      * optimize const index related op
      
      * bug fix
      
      * reduce gdn test
      
      * test fix
      
      * lintfix
      
      * lint fix
      
      * test fix
      fc4bd452
  3. 07 Aug, 2025 1 commit
    • Zhengju Tang's avatar
      Gated Delta Net(GDN) kernel implementation in TileLang (#695) · 6f59668d
      Zhengju Tang authored
      * [GDN] Add examples for GDN forward and backward kernels
      
      * [Refactor] Folder structure refactor for duplicated utils
      
      * [Test] Add test script for kernels
      
      * [Refactor] Rename examples to align with the repo
      
      * [Lint] Modify README
      
      * [Update] Modified README to align upstream repo
      
      * [BugFix] Path of FLA
      
      * [Fix] Copyright and test
      
      * [Lint]
      
      * [CI] Add GDN compilation test CI
      
      * [Lint]
      
      * [BugFix] Import error of fla
      6f59668d