"...composable_kernel.git" did not exist on "3cf22191d66fbf7fc2b79353d35018eb5057c846"
[Layout] Introduce Flexible Parallel to Support T.serial and local buffers...
[Layout] Introduce Flexible Parallel to Support T.serial and local buffers inside T.Parallel loop (#844)
* Support T.serial and local buffers inside T.Parallel loop.
* Fix reducer layout in T.Parallel nested inside other loops
* Debug output with LOG(INFO)
* Add disable option for WGMMA.
* fix
* Use DLOG; fix missing registration for new pass config
* bug fix
* lint fix
* Enhance GEMM instruction set with UTCMMA and improve local buffer handling in casting example
* Update format.sh shebang, improve logging in layout inference, and enhance buffer store wrapper with detailed comments
* Enhance GEMM instantiation logic and improve layout inference for local buffer detection
- Updated the GEMM instantiation logic to include a check for WGMMA compatibility, ensuring that the conditions for using WGMMA are more robust.
- Refined the layout inference process to better identify when loops manipulate only local buffers, improving the accuracy of thread binding decisions in parallel loops.
---------
Co-authored-by:
Huanqi Cao <caohuanqi@deepseek.com>
Showing
Please register or sign in to comment