Commits · 6fb681fcb3d77d5ed5ccc8ff3d3e5e447272895b · OpenDAS / FlashMLA

27 Jan, 2026 2 commits
- lambda函数优化代码结构 · 6fb681fc
  zhanghj2 authored Jan 27, 2026
  
  6fb681fc
- fix total_num_blocks计算 · 75f8262c
  zhanghj2 authored Jan 27, 2026
  
  75f8262c
26 Jan, 2026 5 commits
- fix 关闭attn sink情况下的错误 · 0ce8ee82
  zhanghj2 authored Jan 26, 2026
  
  0ce8ee82
- 支持attn_sink · 200f01d5
  zhanghj2 authored Jan 26, 2026
  
  200f01d5
- 支持attn_sink · 9b54b03c
  zhanghj2 authored Jan 26, 2026
  
  9b54b03c
- 添加softmax · 5813dcc1
  zhanghj2 authored Jan 26, 2026
  
  5813dcc1
- 适配v32的decode kernel · 0e1300f7
  zhanghj2 authored Jan 26, 2026
  
  0e1300f7
25 Jan, 2026 6 commits
- open check_if_all_features_are_supported_and_abort · 7abe5160
  zhanghj2 authored Jan 25, 2026
  
  7abe5160
- 适配combine kernel · 755d8be7
  zhanghj2 authored Jan 25, 2026
  
  755d8be7
- 适配dcu卡架构 · 572946f5
  zhanghj2 authored Jan 25, 2026
  
  572946f5
- 适配s_trap · 8b0ec03c
  zhanghj2 authored Jan 25, 2026
  
  8b0ec03c
- 适配get_decoding_sched_meta · 7fdeaaa8
  zhanghj2 authored Jan 25, 2026
  
  7fdeaaa8
- 空kernel可以编译通过 · e2e0225c
  zhanghj2 authored Jan 25, 2026
  
  e2e0225c
20 Jan, 2026 1 commit
- nits · 48c6dc42
  Shengyu Liu authored Jan 20, 2026
  
  48c6dc42
19 Jan, 2026 1 commit
- Add missing include<span> · c741387b
  Jiashi Li authored Jan 19, 2026
```
Co-authored-by: baowending.bwd <baowending.bwd@alibaba-inc.com>
```
  c741387b
16 Jan, 2026 1 commit
- Multiple updates and refactorings (#150) · 082094b7
  Shengyu Liu authored Jan 16, 2026
```
* Multiple updates and refactorings

* Remove dead code
```
  082094b7
30 Sep, 2025 4 commits
- Update README · 1408756a
  Jiashi Li authored Oct 01, 2025
  
  1408756a
- Code format · 1858932a
  Jiashi Li authored Sep 30, 2025
  
  1858932a
- Fix error message · 7f55c715
  Jiashi Li authored Sep 30, 2025
  
  7f55c715
- Update blog and README · e9b67321
  Shengyu Liu authored Sep 30, 2025
  
  e9b67321
29 Sep, 2025 6 commits
- Rename deep dive blog · 42f3c578
  Shengyu Liu authored Sep 29, 2025
  
  42f3c578
- Add Deep-Dive Blog for the New Sparse Decoding Kernel on Hopper (#100) · 472477e8
  Shengyu Liu authored Sep 29, 2025
  
  472477e8
- Add Sparse Decoding Kernel and Sparse Prefill Kernel for Blackwell · fd249aac
  Simon Mo authored Sep 29, 2025
```
Signed-off-by: simon-mo <simon.mo@hey.com>
```
  fd249aac
- Merge pull request #98 from deepseek-ai/open-source-h · 17944550
  Shengyu Liu authored Sep 29, 2025
```
Add Sparse Attention Kernels on Hopper
```
  17944550
- Merge remote-tracking branch 'github/main' into open-source-h · 3969f20b
  Shengyu Liu authored Sep 29, 2025
  
  3969f20b
- Fill in link to DSv3.2 paper · 7232d69d
  Shengyu Liu authored Sep 29, 2025
  
  7232d69d
24 Sep, 2025 2 commits
- Add a comment · 87709cf4
  Shengyu Liu authored Sep 24, 2025
  
  87709cf4
- Reorganize files and add sparse prefill/decoding kernels on hopper · c28eca99
  Shengyu Liu authored Sep 24, 2025
  
  c28eca99
22 Sep, 2025 1 commit
- Refine handling for q/v sequence length equals zero. (#92) · ebf30641
  zhang authored Sep 22, 2025
  
  ebf30641
27 Aug, 2025 1 commit

Zeyu WANG authored Aug 27, 2025

* fix calc space bug

* use python code to allocate the buffer for backward kernel

261330bb

25 Aug, 2025 2 commits
- Remove cudaMalloc and cudaFree in backward (#87) · eb758335
  Li Xiang authored Aug 25, 2025
```
* get rid of cudaMalloc and cudaFree

* minor fix

---------
Co-authored-by: Jiashi Li <js.li@high-flyer.cn>
```
  eb758335
- Remove tma padding for fwd inputs (#85) · 2d291b0c
  zhang authored Aug 25, 2025
  
  2d291b0c
14 Aug, 2025 2 commits
- Fix accuracy issue in sum_OdO kernel · c7590278
  Jiashi Li authored Aug 14, 2025
  
  c7590278
- Drop support for CUDA <12.8 · ef5b1a69
  Jiashi Li authored Aug 14, 2025
  
  ef5b1a69
01 Aug, 2025 1 commit

Add more GPU architctures support (#76) · 41b611f7

Zeyu WANG authored Aug 01, 2025



* Add more GPU architctures support

* Merge fmha and mla runner

* add varlen & non varlen support, and add incontiguous tensor support

* update readme

* add varlen api

---------
Co-authored-by: dianzhangc <dianzhangc@nvidia.com>

41b611f7

29 Apr, 2025 2 commits
- update .gitignore · 9edee0c0
  ljss authored Apr 29, 2025
  
  9edee0c0
- update to cutlass 3.9 · 9c5dfab6
  ljss authored Apr 29, 2025
  
  9c5dfab6
28 Apr, 2025 1 commit
- Fix synchronization issues · 01a27728
  ljss authored Apr 28, 2025
  
  01a27728
23 Apr, 2025 2 commits

Fix LaTeX render error (#74) · 70b94685
Shengyu Liu authored Apr 23, 2025

70b94685

Minor fix to the docs to correct FlashAttention-3's paper link and typos (#73) · 6cff5a73

ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 authored Apr 23, 2025

Thank you for open source FlashMLA! Just read the write up and very amazing
work! Found some very minor mistakes regarding to typos, and the link
to the FlashAttention-3 paper is wrong as that is the original FlashAttention
paper, so I just send the PR here. Thanks again!
Signed-off-by: Hollow Man <hollowman@opensuse.org>

6cff5a73