- 26 Nov, 2025 1 commit
-
-
Yunqian Fan authored
* feat: add fp8 variants; add placeholder for fp6/fp4 in meta support ld with pack for fp32 dtype add dump add tempalte expand remove unused dtype and change to rebased apis * fix: when atom-m!=128, enable_ws * fix: typo in tcgen05 meta; dispatch in gemm sm100
-
- 24 Nov, 2025 1 commit
-
-
Lei Wang authored
This reverts commit 0d101c11 . Co-authored-by:
Zhiwen Mo <zm125@ic.ac.uk>
-
- 21 Nov, 2025 1 commit
-
-
Yunqian Fan authored
support ld with pack for fp32 dtype add dump add tempalte expand remove unused dtype and change to rebased apis
-
- 02 Nov, 2025 1 commit
-
-
Lei Wang authored
* remove debug print * pipeline fix * use the correct buffer access scope * rs support * warp warpgroup_fence_operand * fix * fp8 dtype ptx enhance * mma fix * TCGEN05 Interface * tcgen05 support * rebase * update * Enhance TCGEN05 support by adding new intrinsic operations and descriptors. Introduced `ptx_tcgen05_mma_ts` for tensor-memory to shared-memory instructions and `tcgen05_mma_arrive` for signaling barrier completion. Updated existing descriptors and code generation logic to accommodate these changes, ensuring compatibility with new instruction sets. Refactored related allocation functions and improved handling of shared memory descriptors. * lint fix * Refactor buffer reference handling in CUDA code generation and update test execution in tilelang. Ensure default annotations for unrolling are set correctly in TIR IR module. * wgmma fix --------- Co-authored-by:Zhiwen Mo <zm125@ic.ac.uk>
-