- 04 Oct, 2025 1 commit
-
-
lijinpei authored
* [Example] Optimize online_softmax example - Y should be output in float16. - BN needs to be equal to N to be really online. - On my H100 machine, this increase speedup from 1.424x to 2.788x. * enhance --------- Co-authored-by:LeiWang1999 <leiwang1999@outlook.com>
-
- 25 Jun, 2025 1 commit
-
-
Cunxiao Ni authored
* [Example] Update kernel compilation in examples to use @tilelang.jit - Refactored multiple examples to eliminate the use of `tilelang.compile` for kernel creation, directly invoking the functions instead. - Added `@tilelang.jit` decorators with appropriate output indices to enhance performance and maintainability. - Improved code clarity by simplifying the kernel invocation process across various examples, ensuring consistency in how kernels are defined and executed. * format * Update example_tilelang_sparse_gqa_decode_varlen_indice.py * Update example_dequant_gemm_fine_grained.py * Update example_gemm_autotune.py --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 23 Jun, 2025 1 commit
-
-
Jianqiao Lu authored
* feat: add a easy version for online softmax * fix: set x & y to fragment memory to load data from global memory * feat: apply format check * Add License --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-