Commits · 0997c33310ed5e496c79a2b3c659cacc0a2aeba2 · OpenDAS / tilelang

"src/git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "d0bcc85a4273eae2b9640b0505d3ae264653e0c8"

12 Apr, 2025 1 commit

[Docs] Add AMD Flash MLA Documentation to Tutorials Section (#376) · 0997c333

Lei Wang authored Apr 12, 2025

* [Add] Introduce deepseek_mla documentation for high-performance FlashMLA with TileLang

- Added a comprehensive guide on writing high-performance kernels using TileLang, focusing on the Multi-Head Latent Attention (MLA) mechanism.
- Included benchmark results comparing FlashMLA, TileLang, Torch, Triton, and FlashInfer, highlighting TileLang's efficiency and ease of use.
- Detailed implementation strategies, including layout inference, threadblock swizzling, shared memory swizzling, and warp specialization.
- Provided examples and explanations of optimization techniques to enhance performance in GPU kernel programming.

* doc update

* [Add] Enhance AMD FlashMLA implementation and documentation

- Refactored variable names in `benchmark_mla_decode_amd_tilelang.py` for clarity, changing `Q_shared` and `Q_pe_shared` to `Q_local` and `Q_pe_local` to reflect their usage in register allocation.
- Added a new `README.md` detailing the high-performance FlashMLA implementation on AMD MI300X accelerators, including architectural considerations, optimization strategies, and performance evaluation.
- Introduced a performance comparison figure to illustrate the efficiency of the TileLang implementation against other frameworks.

* lint fix

* [Add] Expand deepseek_mla documentation for AMD MI300X optimization strategies

- Introduced a new section detailing architectural differences and optimization strategies for implementing FlashMLA on AMD MI300X accelerators.
- Highlighted key considerations such as instruction set variations, shared memory constraints, tile size flexibility, and memory bank conflict swizzling.
- Included performance evaluation results demonstrating TileLang's efficiency compared to other frameworks.
- Discussed future optimization opportunities for memory bank conflict mitigation and dimension parallelization.

0997c333

02 Feb, 2025 1 commit

[Doc] Add matmul kernel tutorial documentations with tile library (#60) · ea612446

Lei Wang authored Feb 02, 2025

* implement jit test case

* [Dev] implement auto tune test case for matrix multiplication

* Implement test for legalize memory access and vectorized loop

* lint fix

* introduce run_once

* Refactor callback function names for consistency and improve code readability

* enhance documentations

* lint fix

* lint fix

* lint fix

* lint fix

* fix formatting issues in rt_mod_hip.cc

* add random seed initialization for deterministic testing

* Add documentation images and comprehensive GEMM tutorial for TileLang

* Update MATMUL documentation title to highlight Tile Library

ea612446

26 Jan, 2025 1 commit

[Doc] Addd debug relevant testing and documentations (#58) · 5e259239

Lei Wang authored Jan 26, 2025

* implement jit test case

* [Dev] implement auto tune test case for matrix multiplication

* Implement test for legalize memory access and vectorized loop

* lint fix

* introduce run_once

* Refactor callback function names for consistency and improve code readability

* enhance documentations

* lint fix

* lint fix

* lint fix

* lint fix

* fix formatting issues in rt_mod_hip.cc

* add random seed initialization for deterministic testing

5e259239

24 Jan, 2025 1 commit
- [Doc] Update documentation structure and content: add overview section, revise... · 3f38c10c
  Lei Wang authored Jan 24, 2025
```
[Doc] Update documentation structure and content: add overview section, revise project name, and change theme to Furo (#39)
```
  3f38c10c
21 Jan, 2025 1 commit

[Doc] Use sphinx to generate docs. (#21) · 8e7feca9

Wenhao Xie authored Jan 22, 2025

* [Doc] Use sphinx to generate docs.

* [Doc] Fix a bug on tlcpack_sphinx_addon.

* [Doc] Fix linting issues.

8e7feca9