[Refactor] Remove small array reuse condition in shared memory allocation merging (#654)

- Eliminated the condition that disabled the reuse of small arrays (const_nbits <= 32) in the `MergeSharedMemoryAllocations` function, allowing for more flexible memory management. - Added a comment in `OptimizeForTarget` to clarify the order of applying `MergeSharedMemoryAllocations` after `SplitHostDevice`, ensuring correct allocation site handling in device functions.

[Refactor] Remove small array reuse condition in shared memory allocation merging (#654)
- Eliminated the condition that disabled the reuse of small arrays (const_nbits <= 32) in the `MergeSharedMemoryAllocations` function, allowing for more flexible memory management. - Added a comment in `OptimizeForTarget` to clarify the order of applying `MergeSharedMemoryAllocations` after `SplitHostDevice`, ensuring correct allocation site handling in device functions.
8205791d · Lei Wang · GitHub · 6e994b12 · 8205791d · 8205791d
Unverified Commit 8205791d authored Jul 21, 2025 by Lei Wang Committed by GitHub Jul 21, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 5 deletions

src/transform/merge_shared_memory_allocations.cc src/transform/merge_shared_memory_allocations.cc +0 -4

tilelang/engine/phase.py tilelang/engine/phase.py +2 -1

No files found.
--- a/src/transform/merge_shared_memory_allocations.cc
+++ b/src/transform/merge_shared_memory_allocations.cc
@@ -965,10 +965,6 @@ private:
    StorageEntry *e = it->second;
    ICHECK_NE(e->allocs.size(), 0U);
-    // disable reuse of small arrays
-    if (e->const_nbits > 0 && e->const_nbits <= 32)
-      return;
    // normal free.
    if (e->const_nbits != 0) {
      const_free_map_.insert({e->const_nbits, e});

--- a/tilelang/engine/phase.py
+++ b/tilelang/engine/phase.py
@@ -163,7 +163,8 @@ def OptimizeForTarget(mod: IRModule, target: Target) -> IRModule:
        mod = tilelang.transform.ThreadSync("global")(mod)
    mod = tilelang.transform.AnnotateDeviceRegions()(mod)
    mod = tir.transform.SplitHostDevice()(mod)
+    # MergeSharedMemoryAllocations must be applied after SplitHostDevice
+    # because the merged allocation site is at the beginning of each device function
    enable_aggressive_merge = should_enable_aggressive_merge(pass_ctx=pass_ctx, target=target)
    # Hopper Swizzling requires dynamic shared memory address to be aligned to 1024 bytes
    # For other devices, we align to 16 bytes