Commits · 08262bcea4e6d59904374ea3ba0d3fc2742f25fb · OpenDAS / tilelang

11 Dec, 2025 1 commit
- [Doc] Minor documentation update (#1410) · 08262bce
  Lei Wang authored Dec 12, 2025
  
  08262bce
01 Dec, 2025 1 commit

[Language] support `T.gemm_sp_v2` on sm80 and sm89 (#1056) · 283a9a00

botbw authored Dec 01, 2025

* [misc] add a cpp side wrapper for gemm_sp_py

* [misc] typing

* [IR] bind GemmSPWarpPolicy

* [chore] add wrapper code

* [IR] fix GemmSPWarpPolicy

* [codegen] apply ptxas instructions

* [intrinsic] add typical (unused) mma layout

* [template] add uint16 debug func

* [intrinsic] add b matrix layout

* [gemm_sp] enable fp16/bf16 on sm8x

* [layout] refactor fp16/bf16 layout

* [gemm_sp] enable int8

* [chore] update test case dtype

* [gemm_sp] enable fp32

* [layout] refactor layouts

* [intrinsic] enable ldmatrix for mat A

* [layout] enable ldsm for matrix b

* [layout] add ldmatrix for fp32 and fp8

* [chore] refine

* [chore] refactor

* [chore] add fp8 efactor

* [chore] refactor

* [chore] add remove negative zero util

* [example] add a custom compress kernel

* [chore] minor update

* [test] refactor gemm_sp test

* [refactor] make metadata layout func

* [example] add option for using cutlass layout

* [doc] add a gemm_sp doc

* [doc] minor polish

* [chore] remove unused

* [bugfix] fix non replicate b case

* [test] refactor

* [chore] add a check

* [bugfix] fix util bug

* [wip] init a new test case for v2

* [chore] minor refactor

* [chore] minor update

* [bugfix] enable 16bit rs

* [language] enable rs

* [language] enable gemm_sp_sr

* [language] enable gemm_sp_rr

* [test] enable more tests

* [tvm] update ffi binding

* [chore] remove print

* [chore] fix benchmark script

* [lint] precommit lint

* [chore] apply feedback

* [test] use arch 8.0

* [chore] rollback ::ordered_metadata for backward compatibility

* [bugfix] fix captialized

* [example] keep gemm_sp on hopper

* [test] fix no fp8 normal kernel

* [test] reduce matmul size to satisfy accum error

* [test] use cal_diff for assertion

* [bugfix] expand float8 type

* [lib] add make_int4 for short type

* [language] add transpose E

* [bugfix] fix wrong var

* [format] format

* [chore] refactor binding

* [chore] fix wrong passing var

283a9a00

12 Apr, 2025 1 commit

[Docs] Add AMD Flash MLA Documentation to Tutorials Section (#376) · 0997c333

Lei Wang authored Apr 12, 2025

* [Add] Introduce deepseek_mla documentation for high-performance FlashMLA with TileLang

- Added a comprehensive guide on writing high-performance kernels using TileLang, focusing on the Multi-Head Latent Attention (MLA) mechanism.
- Included benchmark results comparing FlashMLA, TileLang, Torch, Triton, and FlashInfer, highlighting TileLang's efficiency and ease of use.
- Detailed implementation strategies, including layout inference, threadblock swizzling, shared memory swizzling, and warp specialization.
- Provided examples and explanations of optimization techniques to enhance performance in GPU kernel programming.

* doc update

* [Add] Enhance AMD FlashMLA implementation and documentation

- Refactored variable names in `benchmark_mla_decode_amd_tilelang.py` for clarity, changing `Q_shared` and `Q_pe_shared` to `Q_local` and `Q_pe_local` to reflect their usage in register allocation.
- Added a new `README.md` detailing the high-performance FlashMLA implementation on AMD MI300X accelerators, including architectural considerations, optimization strategies, and performance evaluation.
- Introduced a performance comparison figure to illustrate the efficiency of the TileLang implementation against other frameworks.

* lint fix

* [Add] Expand deepseek_mla documentation for AMD MI300X optimization strategies

- Introduced a new section detailing architectural differences and optimization strategies for implementing FlashMLA on AMD MI300X accelerators.
- Highlighted key considerations such as instruction set variations, shared memory constraints, tile size flexibility, and memory bank conflict swizzling.
- Included performance evaluation results demonstrating TileLang's efficiency compared to other frameworks.
- Discussed future optimization opportunities for memory bank conflict mitigation and dimension parallelization.

0997c333

02 Feb, 2025 1 commit

[Doc] Add matmul kernel tutorial documentations with tile library (#60) · ea612446

Lei Wang authored Feb 02, 2025

* implement jit test case

* [Dev] implement auto tune test case for matrix multiplication

* Implement test for legalize memory access and vectorized loop

* lint fix

* introduce run_once

* Refactor callback function names for consistency and improve code readability

* enhance documentations

* lint fix

* lint fix

* lint fix

* lint fix

* fix formatting issues in rt_mod_hip.cc

* add random seed initialization for deterministic testing

* Add documentation images and comprehensive GEMM tutorial for TileLang

* Update MATMUL documentation title to highlight Tile Library

ea612446

26 Jan, 2025 1 commit

[Doc] Addd debug relevant testing and documentations (#58) · 5e259239

Lei Wang authored Jan 26, 2025

* implement jit test case

* [Dev] implement auto tune test case for matrix multiplication

* Implement test for legalize memory access and vectorized loop

* lint fix

* introduce run_once

* Refactor callback function names for consistency and improve code readability

* enhance documentations

* lint fix

* lint fix

* lint fix

* lint fix

* fix formatting issues in rt_mod_hip.cc

* add random seed initialization for deterministic testing

5e259239

24 Jan, 2025 1 commit
- [Doc] Update documentation structure and content: add overview section, revise... · 3f38c10c
  Lei Wang authored Jan 24, 2025
```
[Doc] Update documentation structure and content: add overview section, revise project name, and change theme to Furo (#39)
```
  3f38c10c
21 Jan, 2025 1 commit

[Doc] Use sphinx to generate docs. (#21) · 8e7feca9

Wenhao Xie authored Jan 22, 2025

* [Doc] Use sphinx to generate docs.

* [Doc] Fix a bug on tlcpack_sphinx_addon.

* [Doc] Fix linting issues.

8e7feca9