- 18 Nov, 2025 1 commit
-
-
Chaofan Lin authored
* [Language] Add shape check in T.view/reshape * address comments
-
- 12 Nov, 2025 1 commit
-
-
Lei Wang authored
* Update layout handling and introduce reshape functionality - Updated the `LayoutNode` class to include a new `Reshape` method, allowing for dynamic reshaping of layouts based on input shapes. - Enhanced the `OutputShape` method to provide better handling of cases where the analyzer cannot form an `IntervalSet`, implementing fallback mechanisms to ensure safe extents. - Refactored the `ReduceOpNode` to utilize `BufferRegion` for improved memory handling during reduction operations. - Added tests for reshaping functionality and layout transformations to ensure correctness and performance in various scenarios. * lint fix * Revert tvm submodule pointer to 1815c3e0b6ec4ead36370bbd1562025d8529017c; keep src unchanged * Update tvm submodule to commit f0bbd3bf741413c35c389ba5dedd5be206000ad1 * Update tvm submodule to commit f0bbd3bf741413c35c389ba5dedd5be206000ad1 * remove useless prove * remove comment --------- Co-authored-by:tilelang-bot <bot@tilelang>
-
- 23 Aug, 2025 1 commit
-
-
Lei Wang authored
* Remove `thread_partial_sync.cc` and refactor `thread_storage_sync.cc` to streamline synchronization handling. Introduce `thread_sync_types.h` for thread-bound key definitions and reserved named barriers. Update related logic in `ThreadSyncInserter` and `TileLangThreadSync` for improved clarity and efficiency. * Remove `sync_thread_partial` references and related documentation from the codebase. Update CUDA and HIP code generation files to eliminate calls to the removed function. Refactor `__sync_thread_partial` to `sync_thread_partial` in CUDA common header for consistency. * Remove unused import of `bulk_copy.h` in `codegen_hip.cc` to enhance code clarity and maintainability. * Add import of `bulk_copy.h` in `codegen_hip.cc` to support new functionality. * typo fix * Update data type in reduce_sum tests from float16 to float32 for consistency and clarity. Remove redundant dtype tests and streamline run functions. Enhance reshape kernel compilation with pass configurations to address shared memory layout issues. * lint fix * test fix * Enhance CI configuration by adding verbose output to pip install command for better visibility during installation. * use ninja instead of make * Add CMake configuration step for Ninja build system in setup.py * Update pyproject.toml to include additional build dependencies: build, torch, tox, auditwheel, patchelf, and ninja. * Enhance CI configuration by adding verbose output to pytest commands for improved test visibility. * Update pyproject.toml to add Cython as a build dependency. Enhance thread storage synchronization in thread_storage_sync.cc by introducing new thread variable handling and improving index disjointness checks. * Update data type in cumulative sum tests from float16 to float32 for consistency. Modify run_cumsum function to utilize the updated dtype and enhance result validation with assertions. Adjust test cases accordingly. * Refactor storage access handling by introducing buffer data mapping in TileLangStorageAccessVisitor. Enhance access entry structure to include pointer access flag. Update thread storage synchronization to accommodate new buffer data mappings. Adjust quickstart example to print kernel source for debugging purposes. * Refactor linear index conversion in TileLangStorageAccessVisitor to utilize the analyzer for simplification. Update buffer index calculations to ensure consistent simplification of range expressions. * bugfix * Refactor buffer index calculation in TileLangStorageAccessVisitor to simplify access handling. Removed unused buffer mapping logic, ensuring consistent buffer index generation with a default ramp. * Refactor TileLangStorageAccessVisitor to replace buffer indices with buffer ranges for improved pointer access handling. Update AccessEntry structure to include buffer_ranges and adjust thread storage synchronization logic to account for pointer access conflicts. * Refactor thread storage synchronization to replace 'shared.dyn' with 'shared' for consistency in memory allocation. Update related test cases to reflect this change and ensure proper functionality.
-
- 22 Aug, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Merge bulk copy into copy and refactor layout inference for bulk copy * Deleted the `bulk_copy` operator implementation and its header file as it is no longer needed. * Introduced a new function `cuTensorMapType()` to return the data type for CUDA tensor mapping. * Updated related files to reflect these changes, ensuring that the codebase remains clean and maintainable. * lint fix * Fix typos in intrinsic names and remove unused print statement in block_sparse_attn_tilelang.py. Updated references from `ptx_ldmatirx` to `ptx_ldmatrix` across multiple files for consistency. * remove bulk copy * Refactor copy and atomic add operations to support TMA lower configuration - Updated `GetCopyInst` to accept a `disable_tma_lower` parameter, allowing for conditional usage of TMA in bulk load/store operations. - Modified `Lower` method in `Copy` to incorporate the new TMA configuration. - Refactored `AtomicAdd::Lower` to streamline layout inference and vectorization logic. - Removed unused `disable_tma_lower` field from `LowerArgs` structure for clarity. - Enhanced atomic add vectorization by replacing the buggy implementation with a more robust loop vectorization approach. * Enhance TMA bulk copy logic in `LowerBulkCopy` method - Added a condition to set `desc.swizzle` to `CU_TENSOR_MAP_SWIZZLE_NONE` when `shared_layout` matches `linear_layout`, improving clarity in layout handling. - Updated warning log to provide more detailed information about fallback scenarios, including source and destination buffer names and shapes, enhancing debugging capabilities. * lint fix * Remove fallback logging for non-swizzled global layout in `LowerBulkCopy` method to streamline the bulk copy logic. This change enhances code clarity by eliminating unnecessary warning messages related to inner box dimensions. * Enhance reshape kernel compilation in `run_reshape` and `run_reshape_smem_1d_2_2d` functions - Updated the `tl.compile` method to include `pass_configs` that disable TMA lower and warp specialization, addressing shared memory layout transformation limitations. - Added TODO comments to indicate the need for further improvements in shared memory handling. * Update `native_sparse_attention` function to include TMA configuration options - Added `pass_configs` to the JIT decorator to disable TMA lower and warp specialization, addressing potential issues with shared memory layout transformations. - Updated comments to clarify modifications in tensor shapes for inference, specifically setting `q` sequence length to 1. * Refactor JIT decorator formatting in `native_sparse_attention` function - Improved readability by reformatting the JIT decorator parameters for `native_sparse_attention`, ensuring consistent style across the codebase. - No functional changes were made; this update focuses on code clarity and maintainability. * Enhance thread management and logging in TileLang compilation - Added a method to check if printing is enabled during compilation, improving control over logging behavior. - Updated the JIT kernel class to utilize the new method for logging compilation status, ensuring consistent and clear output. - Added comments to clarify the purpose of changes and improve code readability. * Add warp specialization scope and refactor register management in TileLang - Introduced a new constant `kWarpSpecializationScope` in `builtin.h` for better attribute management. - Removed the `SetMaxNRegCollector` class and its related logic from `warp_specialized_rewriter.cc`, streamlining the warp specialization process. - Added functions `annotate_producer_reg_dealloc` and `annotate_consumer_reg_alloc` in `builtin.py` to facilitate register management. - Implemented `AnnotateWarpGroupRegAlloc` in `__init__.py` to inject register allocation calls into warp-specialized functions, enhancing the overall register handling in the compilation process. * Refactor test for InjectSetMaxNReg pass in TileLang - Improved readability by restructuring conditional checks and assertions in the test cases. - Enhanced clarity in the collection of `set_max_nreg` calls by simplifying the logic. - Ensured consistent formatting and spacing throughout the test functions for better maintainability. * Enhance bulk copy and store checks in `Copy` class - Updated scope validation for source and destination tensors in `CheckBulkLoad` and `CheckBulkStore` methods to include both `shared.dyn` and `shared` as valid options. - Modified `CheckLDSMCopy` and `CheckSTSMCopy` methods to accommodate the new scope validation, ensuring compatibility with shared memory configurations. - Improved logging in `LowerBulkCopy` to provide clearer warnings regarding unsupported swizzle layouts, including source and destination names for better debugging. * lint fix
-
- 15 Aug, 2025 1 commit
-
-
Gabriel Wu authored
* chore: fix typos * chore: fix ruff * chore: fix clang-format
-
- 30 Jul, 2025 1 commit
-
-
Lei Wang authored
* Update CI workflow to use Python 3.12 and enable build isolation for pip installations - Changed the Python version in the CI configuration from 3.9 to 3.12 to ensure compatibility with the latest features and improvements. - Updated the `PIP_NO_BUILD_ISOLATION` environment variable from `0` to `1` in the CI configuration, allowing pip to install testing requirements with build isolation enabled, which enhances the installation process during CI runs. * Update CI workflow to trigger on pull requests instead of pull_request_target - Changed the event trigger in the CI configuration from `pull_request_target` to `pull_request` to ensure the workflow runs on pull requests, enhancing the integration process. * Refactor CI workflow to remove unnecessary repository and token settings - Removed the repository and token parameters from the checkout step in the CI configuration, simplifying the workflow setup and improving security by not exposing sensitive information. * Remove pip install command from CI workflow to streamline installation process * Refactor reshape functions and tests for shared memory operations - Renamed and updated `reshape_test_smem` to `reshape_test_smem_1d_2_2d` and `run_reshape_smem` to `run_reshape_smem_1d_2_2d` for clarity. - Introduced a new reshape function `reshape_test_smem_2d_2_1d` and its corresponding runner `run_reshape_smem_2d_2_1d`. - Updated tests to reflect the new function names and added a test for the 2D to 1D reshape functionality, enhancing test coverage and clarity.
-
- 26 Mar, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Improve flash attention example and layout comparison logic - Removed unnecessary annotation for `lse_local_split` in the flash attention example to streamline the code. - Updated the handling of `lse_local_split` to utilize parallel processing for better performance. - Refactored kernel compilation and profiling logic to enhance clarity and maintainability in the flash attention example. - Added a condition in `FragmentNode::IsEqual` to handle broadcast cases, improving the robustness of layout comparisons. * lint fix * [Enhancement] Add support for shared memory scope in Fill operation - Introduced handling for `shared.dyn` and `shared` memory scopes in the Fill operation. - Implemented parallel operation and layout inference for improved performance in shared memory scenarios. - Updated thread loop partitioning and vectorization logic to accommodate new memory scope handling. * [Refactor] Remove deprecated decorator and enhance Cython kernel handling - Removed the deprecated decorator from the main module and added a new implementation in the utils module for better organization. - Introduced a pointer map in the Cython kernel adapter to manage pointer arguments, improving runtime shape resolution. - Updated the Cython kernel wrapper to utilize the new pointer map for handling kernel arguments. - Enhanced error checking in the tensor utility functions to ensure static shapes are enforced. - Added a new proxy module for buffer and tensor handling, streamlining the interface for TIR programs. * [Feature] Add matrix multiplication test and kernel implementation - Introduced a new test file `test_tilelang_language_ptr.py` that implements a matrix multiplication function using TileLang's primitives. - The `matmul_test` function defines a kernel for performing tile-level GEMM operations with customizable block sizes and data types. - Added a `run_matmul` function to compile and execute the kernel, along with a test function to validate the implementation. - Updated the `proxy.py` file to enhance type handling for buffer and tensor proxies, ensuring compatibility with TIR programs. - Minor formatting improvements in `deprecated.py` for better readability. * lint fix * [Refactor] Update tensor creation in matrix multiplication test - Replaced `T.Tensor.from_ptr` with `T.make_tensor` in `matmul_test` for improved clarity and consistency. - Updated imports in `__init__.py` to include `make_tensor`. - Added `make_tensor` function in `proxy.py` to streamline tensor creation from pointers. * [Refactor] Update tensor definitions across multiple files - Replaced instances of `T.Tensor` with updated tensor definitions in various benchmark and example files to enhance consistency and clarity. - Adjusted tensor shapes and types in functions related to matrix multiplication, attention mechanisms, and other operations. - Improved documentation in README and example files to reflect changes in tensor usage. * lint fix * [Refactor] Update tensor types in attention and matrix multiplication examples - Replaced instances of `T.Tensor` with `T.SharedTensor` and `T.FragmentTensor` in various attention and matrix multiplication functions to improve consistency and clarity. - Adjusted tensor definitions in benchmark and example files to align with the new tensor types. - Enhanced the overall structure and readability of the code by standardizing tensor usage across multiple files. * lint fix * [Refactor] Update tensor types in GEMM example and test files - Replaced instances of `T.Tensor` with `T.LocalTensor` and `T.Buffer` in the GEMM example and related test functions to improve consistency and clarity. - Enhanced the overall structure of the code by standardizing tensor usage across multiple files, aligning with recent updates in tensor definitions. * [Refactor] Update tensor usage in customize.py - Replaced instances of `T.Tensor` with `T.Buffer` in the `reshape` and `view` functions to enhance consistency with recent tensor definitions. - Improved code clarity by standardizing buffer usage across the file. * [Refactor] Update tensor types in test_tilelang_transform_annotate_device_regions.py - Replaced instances of `T.Tensor` with `T.Buffer` in the `before` and `expected` methods of the `TestAnnotateThreadExtent` and `TestAnnotateDeviceScope` classes to enhance consistency with recent tensor definitions. - Improved code clarity by standardizing buffer usage across the test file. * [Refactor] Update tensor types to SharedBuffer and FragmentBuffer - Replaced instances of `T.SharedTensor` and `T.FragmentTensor` with `T.SharedBuffer` and `T.FragmentBuffer` across multiple benchmark, example, and test files to enhance consistency with recent tensor definitions. - Improved code clarity and structure by standardizing buffer usage in attention and matrix multiplication functions. * [Refactor] Introduce Tensor alias for Buffer in proxy.py - Added a new alias `Tensor` for `Buffer` in `proxy.py` to facilitate JIT compilation, ensuring that inputs and outputs are mapped with `torch.Tensor`. - This change enhances clarity and consistency in tensor usage across the codebase.
-
- 14 Mar, 2025 1 commit
-
-
Lei Wang authored
* [Feature] Add reshape and view functionalities to tilelang language module - Introduced new test files for reshape and view operations in the tilelang language. - Implemented reshape and view functions in the customize module, enhancing buffer manipulation capabilities. - Updated the language initialization to include the new functionalities. - Removed unnecessary import from test_tilelang_language_clamp.py for cleaner code. * Update copyright to Tile-AI Corporation * [Refactor] Clean up whitespace in test files for reshape and view functionalities - Removed unnecessary blank lines in `test_tilelang_language_reshape.py` and `test_tilelang_language_view.py` for improved readability. - Ensured consistent formatting across test files to enhance code clarity.
-