- 23 Apr, 2026 1 commit
-
-
laibao authored
目的: 修复 Qwen3.5 / Qwen3.5-MoE 在升级 transformers 后的配置解析兼容问题,并优化 ROCm 下 unified attention 的路由策略,避免prefill 和 decode落到不同实现路径上,降低后续排查和行为不一致的成本
-
- 22 Apr, 2026 2 commits
- 11 Apr, 2026 1 commit
-
-
laibao authored
-
- 10 Apr, 2026 3 commits
- 08 Apr, 2026 2 commits
- 03 Apr, 2026 2 commits
- 02 Apr, 2026 2 commits
- 01 Apr, 2026 3 commits
- 28 Mar, 2026 1 commit
-
-
wanglong3 authored
-
- 27 Mar, 2026 3 commits
-
-
flyingdown authored
-
laibao authored
-
flyingdown authored
-
- 26 Mar, 2026 6 commits
-
-
laibao authored
-
laibao authored
feat(v1 attention): 为 ROCm FlashAttention 接入 unified kv layout,并打通 mm_prefix、qq_bias 与 use_alibi_sqrt 透传 在 ROCm FlashAttention 后端增加 unified KV layout 选择逻辑 接入 unified varlen kernel 调用路径 在 FlashAttention metadata 中补充 mm_prefix_range 与 qq_bias 透传
-
wanghl6 authored
-
wanghl6 authored
-
wanghl6 authored
-
wanglong3 authored
-
- 24 Mar, 2026 6 commits
- 23 Mar, 2026 1 commit
-
-
guanyu1 authored
-
- 21 Mar, 2026 6 commits
- 20 Mar, 2026 1 commit
-
-
laibao authored
-