• Yu Cheng's avatar
    [Enhancement] Enhance warp specialization logic (#680) · 05f2fc6d
    Yu Cheng authored
    
    
    - Removed unnecessary configurations from the @tilelang.jit decorator in `example_grouped_gemm_fwd.py`, simplifying the kernel compilation process.
    - Updated the `grouped_gemm` function to accept a tuple for batch sizes, enhancing compatibility with the kernel invocation.
    - Added logic in `warp_specialized_rewriter.cc` to track buffer usage in `CallNode` expressions, improving the handling of TMA load operations.
    
    This refactor aims to streamline the code and improve maintainability while ensuring better performance in grouped matrix multiplication operations.
    Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
    05f2fc6d
example_grouped_gemm_fwd.py 7.85 KB