Commits · 620f8769a112358b4fb4be170901d55687d26388 · OpenDAS / FlashMLA

27 Jan, 2026 3 commits
- 修改支持modelv1,v32部分通过,model1未修改完 · 620f8769
  zhanghj2 authored Jan 27, 2026
  
  620f8769
- lambda函数优化代码结构 · 6fb681fc
  zhanghj2 authored Jan 27, 2026
  
  6fb681fc
- fix total_num_blocks计算 · 75f8262c
  zhanghj2 authored Jan 27, 2026
  
  75f8262c
26 Jan, 2026 5 commits
- fix 关闭attn sink情况下的错误 · 0ce8ee82
  zhanghj2 authored Jan 26, 2026
  
  0ce8ee82
- 支持attn_sink · 200f01d5
  zhanghj2 authored Jan 26, 2026
  
  200f01d5
- 支持attn_sink · 9b54b03c
  zhanghj2 authored Jan 26, 2026
  
  9b54b03c
- 添加softmax · 5813dcc1
  zhanghj2 authored Jan 26, 2026
  
  5813dcc1
- 适配v32的decode kernel · 0e1300f7
  zhanghj2 authored Jan 26, 2026
  
  0e1300f7
25 Jan, 2026 6 commits
- open check_if_all_features_are_supported_and_abort · 7abe5160
  zhanghj2 authored Jan 25, 2026
  
  7abe5160
- 适配combine kernel · 755d8be7
  zhanghj2 authored Jan 25, 2026
  
  755d8be7
- 适配dcu卡架构 · 572946f5
  zhanghj2 authored Jan 25, 2026
  
  572946f5
- 适配s_trap · 8b0ec03c
  zhanghj2 authored Jan 25, 2026
  
  8b0ec03c
- 适配get_decoding_sched_meta · 7fdeaaa8
  zhanghj2 authored Jan 25, 2026
  
  7fdeaaa8
- 空kernel可以编译通过 · e2e0225c
  zhanghj2 authored Jan 25, 2026
  
  e2e0225c
20 Jan, 2026 1 commit
- nits · 48c6dc42
  Shengyu Liu authored Jan 20, 2026
  
  48c6dc42
19 Jan, 2026 1 commit
- Add missing include<span> · c741387b
  Jiashi Li authored Jan 19, 2026
```
Co-authored-by: baowending.bwd <baowending.bwd@alibaba-inc.com>
```
  c741387b
16 Jan, 2026 1 commit
- Multiple updates and refactorings (#150) · 082094b7
  Shengyu Liu authored Jan 16, 2026
```
* Multiple updates and refactorings

* Remove dead code
```
  082094b7
30 Sep, 2025 4 commits
- Update README · 1408756a
  Jiashi Li authored Oct 01, 2025
  
  1408756a
- Code format · 1858932a
  Jiashi Li authored Sep 30, 2025
  
  1858932a
- Fix error message · 7f55c715
  Jiashi Li authored Sep 30, 2025
  
  7f55c715
- Update blog and README · e9b67321
  Shengyu Liu authored Sep 30, 2025
  
  e9b67321
29 Sep, 2025 6 commits
- Rename deep dive blog · 42f3c578
  Shengyu Liu authored Sep 29, 2025
  
  42f3c578
- Add Deep-Dive Blog for the New Sparse Decoding Kernel on Hopper (#100) · 472477e8
  Shengyu Liu authored Sep 29, 2025
  
  472477e8
- Add Sparse Decoding Kernel and Sparse Prefill Kernel for Blackwell · fd249aac
  Simon Mo authored Sep 29, 2025
```
Signed-off-by: simon-mo <simon.mo@hey.com>
```
  fd249aac
- Merge pull request #98 from deepseek-ai/open-source-h · 17944550
  Shengyu Liu authored Sep 29, 2025
```
Add Sparse Attention Kernels on Hopper
```
  17944550
- Merge remote-tracking branch 'github/main' into open-source-h · 3969f20b
  Shengyu Liu authored Sep 29, 2025
  
  3969f20b
- Fill in link to DSv3.2 paper · 7232d69d
  Shengyu Liu authored Sep 29, 2025
  
  7232d69d
24 Sep, 2025 2 commits
- Add a comment · 87709cf4
  Shengyu Liu authored Sep 24, 2025
  
  87709cf4
- Reorganize files and add sparse prefill/decoding kernels on hopper · c28eca99
  Shengyu Liu authored Sep 24, 2025
  
  c28eca99
22 Sep, 2025 1 commit
- Refine handling for q/v sequence length equals zero. (#92) · ebf30641
  zhang authored Sep 22, 2025
  
  ebf30641
27 Aug, 2025 1 commit

Zeyu WANG authored Aug 27, 2025

* fix calc space bug

* use python code to allocate the buffer for backward kernel

261330bb

25 Aug, 2025 2 commits
- Remove cudaMalloc and cudaFree in backward (#87) · eb758335
  Li Xiang authored Aug 25, 2025
```
* get rid of cudaMalloc and cudaFree

* minor fix

---------
Co-authored-by: Jiashi Li <js.li@high-flyer.cn>
```
  eb758335
- Remove tma padding for fwd inputs (#85) · 2d291b0c
  zhang authored Aug 25, 2025
  
  2d291b0c
14 Aug, 2025 2 commits
- Fix accuracy issue in sum_OdO kernel · c7590278
  Jiashi Li authored Aug 14, 2025
  
  c7590278
- Drop support for CUDA <12.8 · ef5b1a69
  Jiashi Li authored Aug 14, 2025
  
  ef5b1a69
01 Aug, 2025 1 commit

Add more GPU architctures support (#76) · 41b611f7

Zeyu WANG authored Aug 01, 2025



* Add more GPU architctures support

* Merge fmha and mla runner

* add varlen & non varlen support, and add incontiguous tensor support

* update readme

* add varlen api

---------
Co-authored-by: dianzhangc <dianzhangc@nvidia.com>

41b611f7

29 Apr, 2025 2 commits
- update .gitignore · 9edee0c0
  ljss authored Apr 29, 2025
  
  9edee0c0
- update to cutlass 3.9 · 9c5dfab6
  ljss authored Apr 29, 2025
  
  9c5dfab6
28 Apr, 2025 1 commit
- Fix synchronization issues · 01a27728
  ljss authored Apr 28, 2025
  
  01a27728
23 Apr, 2025 1 commit
- Fix LaTeX render error (#74) · 70b94685
  Shengyu Liu authored Apr 23, 2025
  
  70b94685