Unverified Commit 8205791d authored by Lei Wang's avatar Lei Wang Committed by GitHub
Browse files

[Refactor] Remove small array reuse condition in shared memory allocation merging (#654)

- Eliminated the condition that disabled the reuse of small arrays (const_nbits <= 32) in the `MergeSharedMemoryAllocations` function, allowing for more flexible memory management.
- Added a comment in `OptimizeForTarget` to clarify the order of applying `MergeSharedMemoryAllocations` after `SplitHostDevice`, ensuring correct allocation site handling in device functions.
parent 6e994b12
...@@ -965,10 +965,6 @@ private: ...@@ -965,10 +965,6 @@ private:
StorageEntry *e = it->second; StorageEntry *e = it->second;
ICHECK_NE(e->allocs.size(), 0U); ICHECK_NE(e->allocs.size(), 0U);
// disable reuse of small arrays
if (e->const_nbits > 0 && e->const_nbits <= 32)
return;
// normal free. // normal free.
if (e->const_nbits != 0) { if (e->const_nbits != 0) {
const_free_map_.insert({e->const_nbits, e}); const_free_map_.insert({e->const_nbits, e});
......
...@@ -163,7 +163,8 @@ def OptimizeForTarget(mod: IRModule, target: Target) -> IRModule: ...@@ -163,7 +163,8 @@ def OptimizeForTarget(mod: IRModule, target: Target) -> IRModule:
mod = tilelang.transform.ThreadSync("global")(mod) mod = tilelang.transform.ThreadSync("global")(mod)
mod = tilelang.transform.AnnotateDeviceRegions()(mod) mod = tilelang.transform.AnnotateDeviceRegions()(mod)
mod = tir.transform.SplitHostDevice()(mod) mod = tir.transform.SplitHostDevice()(mod)
# MergeSharedMemoryAllocations must be applied after SplitHostDevice
# because the merged allocation site is at the beginning of each device function
enable_aggressive_merge = should_enable_aggressive_merge(pass_ctx=pass_ctx, target=target) enable_aggressive_merge = should_enable_aggressive_merge(pass_ctx=pass_ctx, target=target)
# Hopper Swizzling requires dynamic shared memory address to be aligned to 1024 bytes # Hopper Swizzling requires dynamic shared memory address to be aligned to 1024 bytes
# For other devices, we align to 16 bytes # For other devices, we align to 16 bytes
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment