src/transform/warp_specialized_rewriter.cc · 05f2fc6d30ed47e53a02e4b4a58bbf853f0cdb08 · OpenDAS / tilelang

"vscode:/vscode.git/clone" did not exist on "a86223f416a3f8183410d9fe38881b77fd536d48"

[Enhancement] Enhance warp specialization logic (#680) · 05f2fc6d

Yu Cheng authored Jul 31, 2025



- Removed unnecessary configurations from the @tilelang.jit decorator in `example_grouped_gemm_fwd.py`, simplifying the kernel compilation process.
- Updated the `grouped_gemm` function to accept a tuple for batch sizes, enhancing compatibility with the kernel invocation.
- Added logic in `warp_specialized_rewriter.cc` to track buffer usage in `CallNode` expressions, improving the handling of TMA load operations.

This refactor aims to streamline the code and improve maintainability while ensuring better performance in grouped matrix multiplication operations.
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

05f2fc6d

warp_specialized_rewriter.cc 41.4 KB

Replace warp_specialized_rewriter.cc