"test/git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "414e2facba4be3cb1e2a5055f4a305d87df5606a"
- 05 Oct, 2025 1 commit
-
-
Cunxiao Ni authored
* [Example] Fix lint to improve grouped GEMM performance with TMA * fix lint
-
- 31 Jul, 2025 1 commit
-
-
Yu Cheng authored
- Removed unnecessary configurations from the @tilelang.jit decorator in `example_grouped_gemm_fwd.py`, simplifying the kernel compilation process. - Updated the `grouped_gemm` function to accept a tuple for batch sizes, enhancing compatibility with the kernel invocation. - Added logic in `warp_specialized_rewriter.cc` to track buffer usage in `CallNode` expressions, improving the handling of TMA load operations. This refactor aims to streamline the code and improve maintainability while ensuring better performance in grouped matrix multiplication operations. Co-authored-by:LeiWang1999 <leiwang1999@outlook.com>
-
- 25 Jun, 2025 1 commit
-
-
Cunxiao Ni authored
* [Example] Update kernel compilation in examples to use @tilelang.jit - Refactored multiple examples to eliminate the use of `tilelang.compile` for kernel creation, directly invoking the functions instead. - Added `@tilelang.jit` decorators with appropriate output indices to enhance performance and maintainability. - Improved code clarity by simplifying the kernel invocation process across various examples, ensuring consistency in how kernels are defined and executed. * format * Update example_tilelang_sparse_gqa_decode_varlen_indice.py * Update example_dequant_gemm_fine_grained.py * Update example_gemm_autotune.py --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 23 May, 2025 2 commits
-
-
Yu Cheng authored
* Introduced `example_grouped_gemm_fwd.py` and `example_grouped_gemm_bwd.py` to demonstrate grouped matrix multiplication with forward and backward operations. * Implemented functions for grouped GEMM, input construction, and validation against PyTorch's implementation. * Added command-line argument parsing for flexible input configuration, including batch sizes and matrix dimensions. * Included a test function to validate the functionality with various input scenarios.
-
Yu Cheng authored
* Introduced a new example script `example_grouped_gemm.py` demonstrating grouped matrix multiplication using TileLang and PyTorch. * Implemented functions for performing grouped GEMM, constructing inputs, and validating results against PyTorch's implementation. * Added command-line argument parsing for flexible input configuration, including batch sizes and matrix dimensions. * Included a test function to validate the grouped GEMM functionality with various input scenarios.
-