Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
FlashMLA
Commits
34489f466e8f6ddf3b7318cd1556a5df8759c005
Switch branch/tag
flashmla
27 Feb, 2026
6 commits
优化代码
· 34489f46
zhanghj2
authored
Feb 27, 2026
34489f46
fp8 tp1性能提升
· 98b7c697
zhanghj2
authored
Feb 27, 2026
98b7c697
优化tp1 warp求和部分
· 24c52aee
zhanghj2
authored
Feb 27, 2026
24c52aee
去掉__syncthreads和分支
· 6a04965a
zhanghj2
authored
Feb 27, 2026
6a04965a
assert FLASH_MLA_ASM_DIR
· ae382f02
zhanghj2
authored
Feb 27, 2026
ae382f02
恢复支持旧接口
· c353b35b
zhanghj2
authored
Feb 27, 2026
c353b35b
25 Feb, 2026
1 commit
恢复测试方式
· c566af36
zhanghj2
authored
Feb 25, 2026
c566af36
24 Feb, 2026
6 commits
更新性能测试方式,仅测试flash_fwd_splitkv_mla_qkvfp8_kernel的性能
· 732079ea
zhanghj2
authored
Feb 24, 2026
732079ea
优化softmax计算
· a4fdef4c
zhanghj2
authored
Feb 24, 2026
a4fdef4c
FLASH_MLA_BF16_TYPE控制bf16转换精度
· 3a477917
zhanghj2
authored
Feb 24, 2026
3a477917
改成FLASH_MLA_OPT
· 4c0bb04e
zhanghj2
authored
Feb 24, 2026
4c0bb04e
smxx修改为gfx9
· 59487e20
zhanghj2
authored
Feb 24, 2026
59487e20
sm90改为gfx93
· f298a271
zhanghj2
authored
Feb 24, 2026
f298a271
22 Feb, 2026
1 commit
支持nhead<16
· a8393a04
zhanghj2
authored
Feb 22, 2026
a8393a04
21 Feb, 2026
2 commits
修复gfx936 bug
· 945ced44
zhanghj2
authored
Feb 21, 2026
945ced44
float传bf16使用round_half_ulp_truncate
· 60dfab33
zhanghj2
authored
Feb 21, 2026
60dfab33
11 Feb, 2026
4 commits
对接口进行架构检查
· 68971b5c
zhanghj2
authored
Feb 11, 2026
68971b5c
删除mha测试用例
· 68055db7
zhanghj2
authored
Feb 11, 2026
68055db7
添加测试用例
· 611e6922
zhanghj2
authored
Feb 11, 2026
611e6922
支持kv 软fp8 e5m2
· 892f7274
zhanghj2
authored
Feb 11, 2026
892f7274
06 Feb, 2026
5 commits
加入版本信息
· 11e445c3
zhanghj2
authored
Feb 06, 2026
11e445c3
优化combine
· b1ba831f
zhanghj2
authored
Feb 06, 2026
b1ba831f
优化combine
· 91691124
zhanghj2
authored
Feb 06, 2026
91691124
支持nmz fp8
· c4412432
zhanghj2
authored
Feb 06, 2026
c4412432
支持nmz qkvfp8
· 26d2ab19
zhanghj2
authored
Feb 06, 2026
26d2ab19
04 Feb, 2026
2 commits
软fp8 e5m2搭框架
· 3eb7071c
zhanghj2
authored
Feb 04, 2026
3eb7071c
搭建支持qkvfp8的框架
· 4976cbaa
zhanghj2
authored
Feb 04, 2026
4976cbaa
03 Feb, 2026
2 commits
便于测试和出包
· 06612d65
zhanghj2
authored
Feb 03, 2026
06612d65
支持纯bf16
· 2033d805
zhanghj2
authored
Feb 03, 2026
2033d805
30 Jan, 2026
4 commits
修改写出
· 58b43d4a
zhanghj2
authored
Jan 30, 2026
58b43d4a
实现了scale使用buffer load读取
· d6379e50
zhanghj2
authored
Jan 30, 2026
d6379e50
使用buffer load lds读取q, 优化了vgpr溢出
· bdf0140b
zhanghj2
authored
Jan 30, 2026
bdf0140b
保存编译生成的汇编
· 515dbd44
zhanghj2
authored
Jan 30, 2026
515dbd44
29 Jan, 2026
7 commits
sparse decode支持head16
· 0651671f
zhanghj2
authored
Jan 29, 2026
0651671f
prefill支持head 16
· b94fdd0f
zhanghj2
authored
Jan 29, 2026
b94fdd0f
减少lds使用, 提高并行度
· 38421051
zhanghj2
authored
Jan 29, 2026
38421051
减少lds用量
· 6d68e3d1
zhanghj2
authored
Jan 29, 2026
6d68e3d1
938架构
· 5d62c0d7
zhanghj2
authored
Jan 29, 2026
5d62c0d7
使用64位计算地址,避免大size类型溢出
· 7a8722d7
zhanghj2
authored
Jan 29, 2026
7a8722d7
topk_length=0的时候,gMax_logits=-inf
· d1c9d3fa
zhanghj2
authored
Jan 29, 2026
d1c9d3fa