1. 26 Jun, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Introduce PassConfig `TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE`... · 3ca5a4ba
      Lei Wang authored
      [Enhancement] Introduce PassConfig `TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE` to enable aggressive shared memory reuse (#602)
      
      * [Enhancement] Add aggressive shared memory merge option in memory allocation
      
      - Introduced a new configuration option `tl.enable_aggressive_shared_memory_merge` to enable aggressive merging of shared memory allocations.
      - Updated the `SharedMemLinearAccessPatternFinder` class to support an aggressive merge strategy, allowing for improved memory reuse.
      - Modified the `MergeSharedMemoryAllocations` function to incorporate the new merging strategy based on the configuration.
      - Enhanced the `PassConfigKey` enumeration to include the new aggressive merge option, ensuring it can be configured appropriately.
      
      * lint fix
      
      * [Enhancement] Add aggressive shared memory merge configuration option
      
      - Introduced a new configuration option `kEnableAggressiveSharedMemoryMerge` to enable aggressive merging of shared memory allocations, enhancing memory management capabilities.
      
      * [Enhancement] Update MergeSharedMemoryAllocations to support aggressive merge option
      
      - Modified the `MergeSharedMemoryAllocations` function to accept an `enable_aggressive_merge` parameter, allowing for more flexible memory management.
      - Introduced a new helper function `should_enable_aggressive_merge` to determine the aggressive merge configuration based on the pass context and target.
      - Updated the relevant calls in the `phase.py` and `__init__.py` files to utilize the new aggressive merge functionality, enhancing the overall memory allocation strategy.
      3ca5a4ba
  2. 18 Jun, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Update warp specialization checking (#580) · 6cede73d
      Lei Wang authored
      * Fix L2 cache size calculation to handle symbolic expressions and ensure float conversion of hit ratios in annotation
      
      * [Enhancement] Update warp specialization check in phase.py
      
      * lint fix
      
      * [Enhancement] Add ContainsSeqStmt method to improve statement handling in merge_shared_memory_allocations.cc
      
      * [Refactor] Simplify memory copy operations in GEMM kernel tests
      
      - Updated memory copy operations in `test_tilelang_kernel_gemm.py` to use shared memory allocations for both A and B matrices, improving clarity and performance.
      - Adjusted the main execution block to include a new `run_gemm_rs` function call for testing, enhancing the test structure.
      
      * revert memory reuse pass.
      
      * revert the memory resue and thread sync pass/
      
      * Update test_tilelang_kernel_gemm.py
      
      * Update test_tilelang_kernel_mha_bwd.py
      6cede73d
  3. 23 May, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Enhance MergeSharedMemoryAllocations Pass for Improved Liveness... · 0fdefe2b
      Lei Wang authored
      [Refactor] Enhance MergeSharedMemoryAllocations Pass for Improved Liveness Analysis and Scope Management (#508)
      
      * Introduced a new StmtAttr structure to track the scope level of statements, enhancing the liveness analysis process.
      * Updated the UpdateStmtAttr function to manage statement attributes effectively during memory allocation visits.
      * Modified the VisitStmt_ methods to utilize the new scope level tracking, ensuring accurate memory access patterns.
      * Refactored the LivenessAnalysis and PlanMemory functions to incorporate statement attributes, improving the handling of gen and kill points in memory management.
      * Added a new helper function allow_warp_specialized in phase.py to conditionally enable warp specialization based on pass context and target, addressing potential bugs in the MergeSharedMemoryAllocations pass.
      * Enhanced the OptimizeForTarget function to conditionally apply the MergeSharedMemoryAllocations pass based on warp specialization settings, improving robustness in memory allocation strategies.
      0fdefe2b
  4. 16 May, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Introduce flag to visualize shared memory merge plan (#496) · dca2fb48
      Lei Wang authored
      * Remove debug print statement from block_sparse_attn_triton.py and implement a timeout handler in autotuner for function execution. This enhances the robustness of the autotuner by allowing it to handle timeouts gracefully.
      
      * Enhance the autotuner module by adding a timeout handler for function execution, improving robustness in handling long-running tasks. This change includes the introduction of a custom TimeoutException and updates to the run_with_timeout function for better signal management.
      
      * Add merge shared memory allocations pass and related configurations
      
      - Introduced a new pass for merging shared memory allocations in GPU kernels, allowing for more efficient memory usage.
      - Registered configuration options for debugging and controlling the merging behavior.
      - Updated relevant files to integrate the new pass into the TileLang engine and transform modules.
      - Adjusted import paths and added documentation for the new functionality.
      
      * Reduce num_stages parameter in GEMM functions from 3 to 1 for improved performance in test_tilelang_kernel_gemm.py
      dca2fb48