1. 31 Jul, 2025 1 commit
    • Yu Cheng's avatar
      [Enhancement] Enhance warp specialization logic (#680) · 05f2fc6d
      Yu Cheng authored
      
      
      - Removed unnecessary configurations from the @tilelang.jit decorator in `example_grouped_gemm_fwd.py`, simplifying the kernel compilation process.
      - Updated the `grouped_gemm` function to accept a tuple for batch sizes, enhancing compatibility with the kernel invocation.
      - Added logic in `warp_specialized_rewriter.cc` to track buffer usage in `CallNode` expressions, improving the handling of TMA load operations.
      
      This refactor aims to streamline the code and improve maintainability while ensuring better performance in grouped matrix multiplication operations.
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      05f2fc6d
  2. 25 Jun, 2025 1 commit
    • Cunxiao Ni's avatar
      [Example] Update examples to use @tilelang.jit (#597) · 3db18726
      Cunxiao Ni authored
      
      
      * [Example] Update kernel compilation in examples to use @tilelang.jit
      
      - Refactored multiple examples to eliminate the use of `tilelang.compile` for kernel creation, directly invoking the functions instead.
      - Added `@tilelang.jit` decorators with appropriate output indices to enhance performance and maintainability.
      - Improved code clarity by simplifying the kernel invocation process across various examples, ensuring consistency in how kernels are defined and executed.
      
      * format
      
      * Update example_tilelang_sparse_gqa_decode_varlen_indice.py
      
      * Update example_dequant_gemm_fine_grained.py
      
      * Update example_gemm_autotune.py
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      3db18726
  3. 23 May, 2025 2 commits
    • Yu Cheng's avatar
      [Dev] Add grouped GEMM backward example scripts (#515) · de028927
      Yu Cheng authored
      * Introduced `example_grouped_gemm_fwd.py` and `example_grouped_gemm_bwd.py` to demonstrate grouped matrix multiplication with forward and backward operations.
      * Implemented functions for grouped GEMM, input construction, and validation against PyTorch's implementation.
      * Added command-line argument parsing for flexible input configuration, including batch sizes and matrix dimensions.
      * Included a test function to validate the functionality with various input scenarios.
      de028927
    • Yu Cheng's avatar
      [Dev] Add grouped GEMM example with TileLang and PyTorch integration (#514) · fb801940
      Yu Cheng authored
      * Introduced a new example script `example_grouped_gemm.py` demonstrating grouped matrix multiplication using TileLang and PyTorch.
      * Implemented functions for performing grouped GEMM, constructing inputs, and validating results against PyTorch's implementation.
      * Added command-line argument parsing for flexible input configuration, including batch sizes and matrix dimensions.
      * Included a test function to validate the grouped GEMM functionality with various input scenarios.
      fb801940