- 27 Jan, 2026 3 commits
- 26 Jan, 2026 5 commits
- 25 Jan, 2026 6 commits
- 20 Jan, 2026 1 commit
-
-
Shengyu Liu authored
-
- 19 Jan, 2026 1 commit
-
-
Jiashi Li authored
Co-authored-by:baowending.bwd <baowending.bwd@alibaba-inc.com>
-
- 16 Jan, 2026 1 commit
-
-
Shengyu Liu authored
* Multiple updates and refactorings * Remove dead code
-
- 30 Sep, 2025 4 commits
-
-
Jiashi Li authored
-
Jiashi Li authored
-
Jiashi Li authored
-
Shengyu Liu authored
-
- 29 Sep, 2025 6 commits
-
-
Shengyu Liu authored
-
Shengyu Liu authored
-
Simon Mo authored
Signed-off-by:simon-mo <simon.mo@hey.com>
-
Shengyu Liu authored
Add Sparse Attention Kernels on Hopper
-
Shengyu Liu authored
-
Shengyu Liu authored
-
- 24 Sep, 2025 2 commits
-
-
Shengyu Liu authored
-
Shengyu Liu authored
-
- 22 Sep, 2025 1 commit
-
-
zhang authored
-
- 27 Aug, 2025 1 commit
-
-
Zeyu WANG authored
* fix calc space bug * use python code to allocate the buffer for backward kernel
-
- 25 Aug, 2025 2 commits
-
-
Li Xiang authored
* get rid of cudaMalloc and cudaFree * minor fix --------- Co-authored-by:Jiashi Li <js.li@high-flyer.cn>
-
zhang authored
-
- 14 Aug, 2025 2 commits
- 01 Aug, 2025 1 commit
-
-
Zeyu WANG authored
* Add more GPU architctures support * Merge fmha and mla runner * add varlen & non varlen support, and add incontiguous tensor support * update readme * add varlen api --------- Co-authored-by:dianzhangc <dianzhangc@nvidia.com>
-
- 29 Apr, 2025 2 commits
- 28 Apr, 2025 1 commit
-
-
ljss authored
-
- 23 Apr, 2025 1 commit
-
-
Shengyu Liu authored
-