- 29 Sep, 2025 6 commits
-
-
Shengyu Liu authored
-
Shengyu Liu authored
-
Simon Mo authored
Signed-off-by:simon-mo <simon.mo@hey.com>
-
Shengyu Liu authored
Add Sparse Attention Kernels on Hopper
-
Shengyu Liu authored
-
Shengyu Liu authored
-
- 24 Sep, 2025 2 commits
-
-
Shengyu Liu authored
-
Shengyu Liu authored
-
- 22 Sep, 2025 1 commit
-
-
zhang authored
-
- 27 Aug, 2025 1 commit
-
-
Zeyu WANG authored
* fix calc space bug * use python code to allocate the buffer for backward kernel
-
- 25 Aug, 2025 2 commits
-
-
Li Xiang authored
* get rid of cudaMalloc and cudaFree * minor fix --------- Co-authored-by:Jiashi Li <js.li@high-flyer.cn>
-
zhang authored
-
- 14 Aug, 2025 2 commits
- 01 Aug, 2025 1 commit
-
-
Zeyu WANG authored
* Add more GPU architctures support * Merge fmha and mla runner * add varlen & non varlen support, and add incontiguous tensor support * update readme * add varlen api --------- Co-authored-by:dianzhangc <dianzhangc@nvidia.com>
-
- 29 Apr, 2025 2 commits
- 28 Apr, 2025 1 commit
-
-
ljss authored
-
- 23 Apr, 2025 2 commits
-
-
Shengyu Liu authored
-
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 authored
Thank you for open source FlashMLA! Just read the write up and very amazing work! Found some very minor mistakes regarding to typos, and the link to the FlashAttention-3 paper is wrong as that is the original FlashAttention paper, so I just send the PR here. Thanks again! Signed-off-by:Hollow Man <hollowman@opensuse.org>
-
- 22 Apr, 2025 2 commits
-
-
Shengyu Liu authored
-
Shengyu Liu authored
* Fix benchmark script * Performance optimization for compute-bound cases * Add new testcase (s_k = 16384) * Update README.md * Update comment * Update README.md * Add the deep-dive blog * Add background color for MLA Kernel Sched.drawio.svg * Use relative path for the schedule image * Move flash_mla.h to kernels/params.h
-
- 01 Mar, 2025 2 commits
- 27 Feb, 2025 3 commits
- 26 Feb, 2025 4 commits
- 25 Feb, 2025 4 commits
-
-
yangsijia.614 authored
-
ljss authored
-
Jiashi Li authored
Support FP16 dtype in FlashMLA kenrel
-
ljss authored
-
- 24 Feb, 2025 5 commits
-
-
Sijia Chen authored
-
Jiashi Li authored
feat: add benchmark for flash_infer vs flash_mla
-
Jiashi Li authored
Update docstring
-
zhengsize authored
-
chunyang.wen authored
-