"examples/dynamic_shape/test_example_dynamic.py" did not exist on "c99b7056b489a5d222e9c7b41b954d243cff97da"
[Enhancement] Enhance warp specialization logic (#680)
- Removed unnecessary configurations from the @tilelang.jit decorator in `example_grouped_gemm_fwd.py`, simplifying the kernel compilation process.
- Updated the `grouped_gemm` function to accept a tuple for batch sizes, enhancing compatibility with the kernel invocation.
- Added logic in `warp_specialized_rewriter.cc` to track buffer usage in `CallNode` expressions, improving the handling of TMA load operations.
This refactor aims to streamline the code and improve maintainability while ensuring better performance in grouped matrix multiplication operations.
Co-authored-by:
LeiWang1999 <leiwang1999@outlook.com>
Showing
Please register or sign in to comment