Commits · 05f2fc6d30ed47e53a02e4b4a58bbf853f0cdb08 · OpenDAS / tilelang

31 Jul, 2025 1 commit

[Enhancement] Enhance warp specialization logic (#680) · 05f2fc6d

Yu Cheng authored Jul 31, 2025



- Removed unnecessary configurations from the @tilelang.jit decorator in `example_grouped_gemm_fwd.py`, simplifying the kernel compilation process.
- Updated the `grouped_gemm` function to accept a tuple for batch sizes, enhancing compatibility with the kernel invocation.
- Added logic in `warp_specialized_rewriter.cc` to track buffer usage in `CallNode` expressions, improving the handling of TMA load operations.

This refactor aims to streamline the code and improve maintainability while ensuring better performance in grouped matrix multiplication operations.
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

05f2fc6d

25 Jun, 2025 1 commit

[Example] Update examples to use @tilelang.jit (#597) · 3db18726

Cunxiao Ni authored Jun 25, 2025



* [Example] Update kernel compilation in examples to use @tilelang.jit

- Refactored multiple examples to eliminate the use of `tilelang.compile` for kernel creation, directly invoking the functions instead.
- Added `@tilelang.jit` decorators with appropriate output indices to enhance performance and maintainability.
- Improved code clarity by simplifying the kernel invocation process across various examples, ensuring consistency in how kernels are defined and executed.

* format

* Update example_tilelang_sparse_gqa_decode_varlen_indice.py

* Update example_dequant_gemm_fine_grained.py

* Update example_gemm_autotune.py

---------
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

3db18726

23 May, 2025 2 commits

[Dev] Add grouped GEMM backward example scripts (#515) · de028927

Yu Cheng authored May 23, 2025

* Introduced `example_grouped_gemm_fwd.py` and `example_grouped_gemm_bwd.py` to demonstrate grouped matrix multiplication with forward and backward operations.
* Implemented functions for grouped GEMM, input construction, and validation against PyTorch's implementation.
* Added command-line argument parsing for flexible input configuration, including batch sizes and matrix dimensions.
* Included a test function to validate the functionality with various input scenarios.

de028927

[Dev] Add grouped GEMM example with TileLang and PyTorch integration (#514) · fb801940

Yu Cheng authored May 23, 2025

* Introduced a new example script `example_grouped_gemm.py` demonstrating grouped matrix multiplication using TileLang and PyTorch.
* Implemented functions for performing grouped GEMM, constructing inputs, and validating results against PyTorch's implementation.
* Added command-line argument parsing for flexible input configuration, including batch sizes and matrix dimensions.
* Included a test function to validate the grouped GEMM functionality with various input scenarios.

fb801940