- 18 May, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Update JIT kernel functions and streamline GEMM tests * Renamed and refactored matmul and run_gemm functions to matmul_kernel_jit and run_gemm_kernel_jit for clarity. * Removed redundant JIT decorator from the matmul function, ensuring it is applied only to the kernel function. * Updated test function names to reflect changes in the kernel functions, enhancing consistency and readability. * Cleaned up commented-out code and unnecessary imports to improve overall code quality. * Update main function call in GEMM test to use tilelang testing framework * Update README and example scripts to include JIT decorator comments * Added comments in README.md and various example scripts to indicate the use of the @tilelang.jit decorator for returning torch functions. * Removed redundant comments that previously instructed to add the decorator, streamlining the documentation and improving clarity. * Update GEMM test parameters for improved performance * Set num_stages to 0 and adjusted matrix dimensions in test functions to enhance performance and consistency across GEMM tests in test_tilelang_kernel_gemm.py.
-
- 26 Mar, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Improve flash attention example and layout comparison logic - Removed unnecessary annotation for `lse_local_split` in the flash attention example to streamline the code. - Updated the handling of `lse_local_split` to utilize parallel processing for better performance. - Refactored kernel compilation and profiling logic to enhance clarity and maintainability in the flash attention example. - Added a condition in `FragmentNode::IsEqual` to handle broadcast cases, improving the robustness of layout comparisons. * lint fix * [Enhancement] Add support for shared memory scope in Fill operation - Introduced handling for `shared.dyn` and `shared` memory scopes in the Fill operation. - Implemented parallel operation and layout inference for improved performance in shared memory scenarios. - Updated thread loop partitioning and vectorization logic to accommodate new memory scope handling. * [Refactor] Remove deprecated decorator and enhance Cython kernel handling - Removed the deprecated decorator from the main module and added a new implementation in the utils module for better organization. - Introduced a pointer map in the Cython kernel adapter to manage pointer arguments, improving runtime shape resolution. - Updated the Cython kernel wrapper to utilize the new pointer map for handling kernel arguments. - Enhanced error checking in the tensor utility functions to ensure static shapes are enforced. - Added a new proxy module for buffer and tensor handling, streamlining the interface for TIR programs. * [Feature] Add matrix multiplication test and kernel implementation - Introduced a new test file `test_tilelang_language_ptr.py` that implements a matrix multiplication function using TileLang's primitives. - The `matmul_test` function defines a kernel for performing tile-level GEMM operations with customizable block sizes and data types. - Added a `run_matmul` function to compile and execute the kernel, along with a test function to validate the implementation. - Updated the `proxy.py` file to enhance type handling for buffer and tensor proxies, ensuring compatibility with TIR programs. - Minor formatting improvements in `deprecated.py` for better readability. * lint fix * [Refactor] Update tensor creation in matrix multiplication test - Replaced `T.Tensor.from_ptr` with `T.make_tensor` in `matmul_test` for improved clarity and consistency. - Updated imports in `__init__.py` to include `make_tensor`. - Added `make_tensor` function in `proxy.py` to streamline tensor creation from pointers. * [Refactor] Update tensor definitions across multiple files - Replaced instances of `T.Tensor` with updated tensor definitions in various benchmark and example files to enhance consistency and clarity. - Adjusted tensor shapes and types in functions related to matrix multiplication, attention mechanisms, and other operations. - Improved documentation in README and example files to reflect changes in tensor usage. * lint fix * [Refactor] Update tensor types in attention and matrix multiplication examples - Replaced instances of `T.Tensor` with `T.SharedTensor` and `T.FragmentTensor` in various attention and matrix multiplication functions to improve consistency and clarity. - Adjusted tensor definitions in benchmark and example files to align with the new tensor types. - Enhanced the overall structure and readability of the code by standardizing tensor usage across multiple files. * lint fix * [Refactor] Update tensor types in GEMM example and test files - Replaced instances of `T.Tensor` with `T.LocalTensor` and `T.Buffer` in the GEMM example and related test functions to improve consistency and clarity. - Enhanced the overall structure of the code by standardizing tensor usage across multiple files, aligning with recent updates in tensor definitions. * [Refactor] Update tensor usage in customize.py - Replaced instances of `T.Tensor` with `T.Buffer` in the `reshape` and `view` functions to enhance consistency with recent tensor definitions. - Improved code clarity by standardizing buffer usage across the file. * [Refactor] Update tensor types in test_tilelang_transform_annotate_device_regions.py - Replaced instances of `T.Tensor` with `T.Buffer` in the `before` and `expected` methods of the `TestAnnotateThreadExtent` and `TestAnnotateDeviceScope` classes to enhance consistency with recent tensor definitions. - Improved code clarity by standardizing buffer usage across the test file. * [Refactor] Update tensor types to SharedBuffer and FragmentBuffer - Replaced instances of `T.SharedTensor` and `T.FragmentTensor` with `T.SharedBuffer` and `T.FragmentBuffer` across multiple benchmark, example, and test files to enhance consistency with recent tensor definitions. - Improved code clarity and structure by standardizing buffer usage in attention and matrix multiplication functions. * [Refactor] Introduce Tensor alias for Buffer in proxy.py - Added a new alias `Tensor` for `Buffer` in `proxy.py` to facilitate JIT compilation, ensuring that inputs and outputs are mapped with `torch.Tensor`. - This change enhances clarity and consistency in tensor usage across the codebase.
-
- 24 Mar, 2025 1 commit
-
-
Lei Wang authored
* Fix indentation in JIT adapter wrapper to ensure consistent formatting of return statement in generated C code. * Enhance Fill Operation in TileLang - Updated the Fill constructor to support BufferLoad instances, adding checks for ramp indices and ensuring only stride 1 ramps are processed. - Introduced a region array to manage the bounds of the fill operation, improving error checking for static regions. - Modified the MakeSIMTLoop method to utilize the new region array for loop variable bounds, enhancing flexibility in kernel generation. - Updated the fill and clear functions in fill.py to accept both tir.Buffer and tir.BufferRegion types, improving usability and type handling. * Refactor Fill Operation and Improve Readability - Simplified the Fill constructor by enhancing the handling of BufferLoad instances and ensuring proper checks for ramp indices. - Improved error messages for region size checks to enhance clarity. - Cleaned up formatting in the Fill method for better readability. - Added a blank line in the matmul function test to improve code organization. - Introduced a blank line in the fill function to enhance readability in fill.py. * Add matrix multiplication functionality and test in TileLang - Introduced a new test file `test_tilelang_language_clear.py` that implements a matrix multiplication function using TileLang's primitives. - The `matmul` function defines a kernel for performing tile-level GEMM operations with customizable block sizes and data types. - Added a `run_matmul` function to compile and execute the kernel, along with a test function to validate the implementation. - Updated the `__init__.py` in the utils module to include `map_torch_type`, enhancing type handling for tensor operations. * lint fix
-
- 23 Mar, 2025 1 commit
-
-
Lei Wang authored
* [Enhancement] Introduce caching control and frame management in TileLang - Added cache control functions (`enable_cache`, `disable_cache`, `is_cache_enabled`) in `env.py` to manage kernel caching behavior. - Updated `kernel_cache.py` to utilize the cache state, preventing unnecessary kernel compilation when caching is disabled. - Introduced a new `frame.py` module to manage LetFrame instances, including a stack for variable-value mapping and enhanced frame management. - Updated imports in various modules to accommodate new caching and frame functionalities, improving overall organization and clarity. * [Refactor] Clean up and enhance caching and frame management in TileLang - Added spacing for improved readability in `env.py` and `frame.py`. - Refactored `LetFrame` class to enhance clarity in buffer region assignment. - Ensured consistent formatting and organization across caching control and frame management functions. * [Feature] Add matrix multiplication functionality in TileLang - Introduced a new test file `test_tilelang_language_alias.py` that implements a matrix multiplication function using TileLang's primitives. - The `matmul` function defines a kernel for performing tile-level GEMM operations, with support for customizable block sizes and data types. - Added a `run_matmul` function to compile and execute the kernel, along with a test function to validate the implementation. - Updated `gemm.py` to allow `tir.Buffer` or `tir.Var` as valid argument types for the `gemm` function, enhancing flexibility in argument handling. * [Refactor] Improve formatting and readability in test_tilelang_language_alias.py - Adjusted spacing and alignment in the `matmul` and `run_matmul` functions for better readability. - Cleaned up unnecessary blank lines and ensured consistent formatting throughout the file. - Enhanced overall code clarity without altering functionality.
-