- 25 Jan, 2026 1 commit
-
-
zhanghj2 authored
-
- 16 Jan, 2026 1 commit
-
-
Shengyu Liu authored
* Multiple updates and refactorings * Remove dead code
-
- 24 Sep, 2025 1 commit
-
-
Shengyu Liu authored
-
- 01 Aug, 2025 1 commit
-
-
Zeyu WANG authored
* Add more GPU architctures support * Merge fmha and mla runner * add varlen & non varlen support, and add incontiguous tensor support * update readme * add varlen api --------- Co-authored-by:dianzhangc <dianzhangc@nvidia.com>
-
- 22 Apr, 2025 1 commit
-
-
Shengyu Liu authored
* Fix benchmark script * Performance optimization for compute-bound cases * Add new testcase (s_k = 16384) * Update README.md * Update comment * Update README.md * Add the deep-dive blog * Add background color for MLA Kernel Sched.drawio.svg * Use relative path for the schedule image * Move flash_mla.h to kernels/params.h
-
- 24 Feb, 2025 1 commit
-
-
Jiashi Li authored
i
-