- 18 Nov, 2025 2 commits
-
-
Lei Wang authored
* [Refactor] Update FFI type handling and simplify argument management * Refactored FFI type definitions in runtime and code generation files to use `TVMFFIAny` instead of `TVMValue`, enhancing type clarity. * Updated function registration in `runtime.cc` to utilize canonical names for better consistency. * Simplified argument handling in the `simplify` transformation, ensuring unused buffer parameters are removed only when simplification is enabled. * Adjusted autotuner and profiler parameters to standardize the execution backend to `tvm_ffi`, improving clarity in backend selection. * Removed obsolete `adapt_torch2tvm` function from tensor utilities to streamline the codebase and reduce complexity. * [Update] Sync TVM submodule and enhance kernel source handling * Updated the TVM submodule to commit cdc2aced, ensuring compatibility with recent changes. * Added functionality to print kernel source in `example_blocksparse_gemm.py` for better debugging. * Commented out the main execution call in test files to prevent unintended execution during testing. * Introduced `tilelang.disable_cache()` in various test files to streamline testing and avoid cache-related issues. * Refactored kernel source retrieval methods to improve clarity and consistency across different execution backends. * [Refactor] Clean up imports and improve code formatting * Removed unused import of `tilelang.testing` in `test_example_blocksparse_gemm.py` to streamline the code. * Reformatted several lines in `arg_binder.cc`, `make_packed_api.cc`, `tvm_ffi.py`, and `adapter.py` for improved readability and consistency. * Updated comments and spacing in `tvm_ffi.py` to enhance clarity without altering functionality. * Update execution backend options and improve resolution logic - Changed default execution backend from "cython" to "auto" in multiple locations to allow automatic selection based on the target. - Expanded the list of supported execution backends to include "torch" and "nvrtc" across various classes and functions. - Enhanced backend resolution logic in `KernelCache` and `AutoTuner` to ensure appropriate backend selection based on the target. - Updated documentation to reflect changes in execution backend options and their defaults. * lint fix * fix * Enhance argument handling in CUDA and HIP runtime modules - Updated `ExtractFuncInfo` in `rt_mod_cuda.cc` and `rt_mod_hip.cc` to map boolean argument types to int32, ensuring compatibility with device runtime. - Refactored `BindDLTensor` in `arg_binder.cc` to improve null handling and validation checks for DLTensor parameters, utilizing expression-level guards to prevent dereferencing null pointers. - Enhanced error checking for buffer shape, strides, and data fields, ensuring robust handling of optional inputs and maintaining consistency across various checks. * lint fix * lint fix * lint fix * lint fix * minor fix * fix * recover check * Refactor argument binding and validation in `arg_binder.cc` - Improved null handling and validation checks in `BindDLTensor`, ensuring safe dereferencing of pointers. - Enhanced consistency checks for buffer shape, strides, and data fields, utilizing expression-level guards. - Updated `MakePackedAPI` to maintain code clarity and consistency in argument handling. - Minor adjustments in test files to streamline kernel execution and improve readability. * lint fix * stride fix * minor fix * fix * lint fix * lint fix * Add CUDA stream access policy window helpers and integrate with L2 persistent cache management - Introduced functions to set and reset the CUDA stream access policy window, allowing for better control over L2 cache usage. - Updated runtime files to include new FFI packed functions for managing stream attributes. - Modified lower_hopper_intrin to incorporate prologue and epilogue statements for L2 cache setup and teardown. - Enhanced tests to verify the inclusion of new FFI calls in the generated kernel source. * check with symbolic * support null ptr * Update CMakeLists and lower.py for code generation and subproject status - Added `codegen_c_host.cc` to the list of source files in CMakeLists.txt for improved code generation support. - Updated the function call in `lower.py` to use `target.build.tilelang_c` for C target host code generation, enhancing compatibility. - Marked the TVM subproject as dirty to indicate local modifications. * lint fix * Update comments for clarity in quickstart.py
-
Jay Zhuang authored
* fix argument order for fla chunk_gated_delta_rule_fwd_h * explicit import assert_similar from utils * rename utils module to avoid name clash * set store_final_state and save_new_value to True * fix --------- Co-authored-by:LeiWang1999 <leiwang1999@outlook.com>
-
- 03 Nov, 2025 1 commit
-
-
Kurisu authored
* tilelang frontend v2 * syntax sugar: defining a local var by annotation * [Refactor] fix type linting warning like `T.float32` * Add tl.local_var_init for new tl.float32 * allow passing default argument as function annotation * allow default arguments as annotation * fix lint error * minor fix * [Refactor] refactor tilelang.jit and tilelang.autotune * minor fix * minor fix * minor fix * fix metal get function name * add par_compile impl and tests * Type consistency on tvm datatype 1. isinstance(tl.float32, tvm.DataType) == True 2. Allow `tl.float32` as function annotations 3. Allow `tl.float32` as argument to be passed to `tl.alloc` or other functions * fix lint error * add more warning in frontend * update tvm version * Minor fix on tvm_ffi annotations * add document and examples * fix lint error * Simplify index calculations in example_chunk_o_bwd.py Refactor index calculations for dg_last_fragment assignment. * minor fix * lint fix --------- Co-authored-by:
Lei Wang <leiwang1999@outlook.com> Co-authored-by:
Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 21 Oct, 2025 1 commit
-
-
Tong WU authored
* [Cleanup] Remove `tilelang.disable_cache()` calls from example scripts * lint * lint
-
- 07 Aug, 2025 1 commit
-
-
Zhengju Tang authored
* [GDN] Add examples for GDN forward and backward kernels * [Refactor] Folder structure refactor for duplicated utils * [Test] Add test script for kernels * [Refactor] Rename examples to align with the repo * [Lint] Modify README * [Update] Modified README to align upstream repo * [BugFix] Path of FLA * [Fix] Copyright and test * [Lint] * [CI] Add GDN compilation test CI * [Lint] * [BugFix] Import error of fla
-