• Lei Wang's avatar
    [FFI] Use tvm ffi as the default execution backend (#1259) · 74da3696
    Lei Wang authored
    * [Refactor] Update FFI type handling and simplify argument management
    
    * Refactored FFI type definitions in runtime and code generation files to use `TVMFFIAny` instead of `TVMValue`, enhancing type clarity.
    * Updated function registration in `runtime.cc` to utilize canonical names for better consistency.
    * Simplified argument handling in the `simplify` transformation, ensuring unused buffer parameters are removed only when simplification is enabled.
    * Adjusted autotuner and profiler parameters to standardize the execution backend to `tvm_ffi`, improving clarity in backend selection.
    * Removed obsolete `adapt_torch2tvm` function from tensor utilities to streamline the codebase and reduce complexity.
    
    * [Update] Sync TVM submodule and enhance kernel source handling
    
    * Updated the TVM submodule to commit cdc2aced, ensuring compatibility with recent changes.
    * Added functionality to print kernel source in `example_blocksparse_gemm.py` for better debugging.
    * Commented out the main execution call in test files to prevent unintended execution during testing.
    * Introduced `tilelang.disable_cache()` in various test files to streamline testing and avoid cache-related issues.
    * Refactored kernel source retrieval methods to improve clarity and consistency across different execution backends.
    
    * [Refactor] Clean up imports and improve code formatting
    
    * Removed unused import of `tilelang.testing` in `test_example_blocksparse_gemm.py` to streamline the code.
    * Reformatted several lines in `arg_binder.cc`, `make_packed_api.cc`, `tvm_ffi.py`, and `adapter.py` for improved readability and consistency.
    * Updated comments and spacing in `tvm_ffi.py` to enhance clarity without altering functionality.
    
    * Update execution backend options and improve resolution logic
    
    - Changed default execution backend from "cython" to "auto" in multiple locations to allow automatic selection based on the target.
    - Expanded the list of supported execution backends to include "torch" and "nvrtc" across various classes and functions.
    - Enhanced backend resolution logic in `KernelCache` and `AutoTuner` to ensure appropriate backend selection based on the target.
    - Updated documentation to reflect changes in execution backend options and their defaults.
    
    * lint fix
    
    * fix
    
    * Enhance argument handling in CUDA and HIP runtime modules
    
    - Updated `ExtractFuncInfo` in `rt_mod_cuda.cc` and `rt_mod_hip.cc` to map boolean argument types to int32, ensuring compatibility with device runtime.
    - Refactored `BindDLTensor` in `arg_binder.cc` to improve null handling and validation checks for DLTensor parameters, utilizing expression-level guards to prevent dereferencing null pointers.
    - Enhanced error checking for buffer shape, strides, and data fields, ensuring robust handling of optional inputs and maintaining consistency across various checks.
    
    * lint fix
    
    * lint fix
    
    * lint fix
    
    * lint fix
    
    * minor fix
    
    * fix
    
    * recover check
    
    * Refactor argument binding and validation in `arg_binder.cc`
    
    - Improved null handling and validation checks in `BindDLTensor`, ensuring safe dereferencing of pointers.
    - Enhanced consistency checks for buffer shape, strides, and data fields, utilizing expression-level guards.
    - Updated `MakePackedAPI` to maintain code clarity and consistency in argument handling.
    - Minor adjustments in test files to streamline kernel execution and improve readability.
    
    * lint fix
    
    * stride fix
    
    * minor fix
    
    * fix
    
    * lint fix
    
    * lint fix
    
    * Add CUDA stream access policy window helpers and integrate with L2 persistent cache management
    
    - Introduced functions to set and reset the CUDA stream access policy window, allowing for better control over L2 cache usage.
    - Updated runtime files to include new FFI packed functions for managing stream attributes.
    - Modified lower_hopper_intrin to incorporate prologue and epilogue statements for L2 cache setup and teardown.
    - Enhanced tests to verify the inclusion of new FFI calls in the generated kernel source.
    
    * check with symbolic
    
    * support null ptr
    
    * Update CMakeLists and lower.py for code generation and subproject status
    
    - Added `codegen_c_host.cc` to the list of source files in CMakeLists.txt for improved code generation support.
    - Updated the function call in `lower.py` to use `target.build.tilelang_c` for C target host code generation, enhancing compatibility.
    - Marked the TVM subproject as dirty to indicate local modifications.
    
    * lint fix
    
    * Update comments for clarity in quickstart.py
    74da3696
CMakeLists.txt 9.9 KB