Commits · 68989d80858d8b034137330cf4fce1165a5db933 · OpenDAS / tilelang

26 Jun, 2025 1 commit

[Enhancement] Introduce PassConfig `TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE`... · 3ca5a4ba

Lei Wang authored Jun 27, 2025

[Enhancement] Introduce PassConfig `TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE` to enable aggressive shared memory reuse (#602)

* [Enhancement] Add aggressive shared memory merge option in memory allocation

- Introduced a new configuration option `tl.enable_aggressive_shared_memory_merge` to enable aggressive merging of shared memory allocations.
- Updated the `SharedMemLinearAccessPatternFinder` class to support an aggressive merge strategy, allowing for improved memory reuse.
- Modified the `MergeSharedMemoryAllocations` function to incorporate the new merging strategy based on the configuration.
- Enhanced the `PassConfigKey` enumeration to include the new aggressive merge option, ensuring it can be configured appropriately.

* lint fix

* [Enhancement] Add aggressive shared memory merge configuration option

- Introduced a new configuration option `kEnableAggressiveSharedMemoryMerge` to enable aggressive merging of shared memory allocations, enhancing memory management capabilities.

* [Enhancement] Update MergeSharedMemoryAllocations to support aggressive merge option

- Modified the `MergeSharedMemoryAllocations` function to accept an `enable_aggressive_merge` parameter, allowing for more flexible memory management.
- Introduced a new helper function `should_enable_aggressive_merge` to determine the aggressive merge configuration based on the pass context and target.
- Updated the relevant calls in the `phase.py` and `__init__.py` files to utilize the new aggressive merge functionality, enhancing the overall memory allocation strategy.

3ca5a4ba

18 Jun, 2025 1 commit

[Enhancement] Update warp specialization checking (#580) · 6cede73d

Lei Wang authored Jun 19, 2025

* Fix L2 cache size calculation to handle symbolic expressions and ensure float conversion of hit ratios in annotation

* [Enhancement] Update warp specialization check in phase.py

* lint fix

* [Enhancement] Add ContainsSeqStmt method to improve statement handling in merge_shared_memory_allocations.cc

* [Refactor] Simplify memory copy operations in GEMM kernel tests

- Updated memory copy operations in `test_tilelang_kernel_gemm.py` to use shared memory allocations for both A and B matrices, improving clarity and performance.
- Adjusted the main execution block to include a new `run_gemm_rs` function call for testing, enhancing the test structure.

* revert memory reuse pass.

* revert the memory resue and thread sync pass/

* Update test_tilelang_kernel_gemm.py

* Update test_tilelang_kernel_mha_bwd.py

6cede73d

23 May, 2025 1 commit

[Refactor] Enhance MergeSharedMemoryAllocations Pass for Improved Liveness... · 0fdefe2b

Lei Wang authored May 23, 2025

[Refactor] Enhance MergeSharedMemoryAllocations Pass for Improved Liveness Analysis and Scope Management (#508)

* Introduced a new StmtAttr structure to track the scope level of statements, enhancing the liveness analysis process.
* Updated the UpdateStmtAttr function to manage statement attributes effectively during memory allocation visits.
* Modified the VisitStmt_ methods to utilize the new scope level tracking, ensuring accurate memory access patterns.
* Refactored the LivenessAnalysis and PlanMemory functions to incorporate statement attributes, improving the handling of gen and kill points in memory management.
* Added a new helper function allow_warp_specialized in phase.py to conditionally enable warp specialization based on pass context and target, addressing potential bugs in the MergeSharedMemoryAllocations pass.
* Enhanced the OptimizeForTarget function to conditionally apply the MergeSharedMemoryAllocations pass based on warp specialization settings, improving robustness in memory allocation strategies.

0fdefe2b

16 May, 2025 1 commit

[Enhancement] Introduce flag to visualize shared memory merge plan (#496) · dca2fb48

Lei Wang authored May 16, 2025

* Remove debug print statement from block_sparse_attn_triton.py and implement a timeout handler in autotuner for function execution. This enhances the robustness of the autotuner by allowing it to handle timeouts gracefully.

* Enhance the autotuner module by adding a timeout handler for function execution, improving robustness in handling long-running tasks. This change includes the introduction of a custom TimeoutException and updates to the run_with_timeout function for better signal management.

* Add merge shared memory allocations pass and related configurations

- Introduced a new pass for merging shared memory allocations in GPU kernels, allowing for more efficient memory usage.
- Registered configuration options for debugging and controlling the merging behavior.
- Updated relevant files to integrate the new pass into the TileLang engine and transform modules.
- Adjusted import paths and added documentation for the new functionality.

* Reduce num_stages parameter in GEMM functions from 3 to 1 for improved performance in test_tilelang_kernel_gemm.py

dca2fb48