- 01 Apr, 2026 5 commits
- 30 Mar, 2026 1 commit
-
-
zhangzbb authored
[BUGFIX] 修复 Qwen3-MoE Attention 中 fused RMS RoPE 的 epsilon 参数顺序错误 See merge request dcutoolkit/deeplearing/vllm!539
-
- 28 Mar, 2026 2 commits
- 27 Mar, 2026 4 commits
-
-
flyingdown authored
-
laibao authored
-
flyingdown authored
use tunning w4a16 moe See merge request dcutoolkit/deeplearing/vllm!535
-
flyingdown authored
-
- 26 Mar, 2026 9 commits
-
-
wangmin6 authored
feat(v1 attention): 为 ROCm FlashAttention 接入 unified kv layout,并打通 mm_prefix、qq_bias 与 use_alibi_sqrt 透传 See merge request dcutoolkit/deeplearing/vllm!526
-
laibao authored
-
laibao authored
feat(v1 attention): 为 ROCm FlashAttention 接入 unified kv layout,并打通 mm_prefix、qq_bias 与 use_alibi_sqrt 透传 在 ROCm FlashAttention 后端增加 unified KV layout 选择逻辑 接入 unified varlen kernel 调用路径 在 FlashAttention metadata 中补充 mm_prefix_range 与 qq_bias 透传
-
wangmin6 authored
The gfx928 architecture force to set VLLM_W8A8_BACKEND == 1 See merge request dcutoolkit/deeplearing/vllm!533
-
wangmin6 authored
glm5 融合算子优化 See merge request dcutoolkit/deeplearing/vllm!534
-
wanghl6 authored
-
wanghl6 authored
-
wanghl6 authored
-
wanglong3 authored
-
- 25 Mar, 2026 3 commits
-
-
wangmin6 authored
支持kvacache fp8_e4m3的RMS_ROPE_CONCAT See merge request dcutoolkit/deeplearing/vllm!531
-
wangmin6 authored
处理VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD导致的awq推理bug问题 See merge request dcutoolkit/deeplearing/vllm!528
-
wangmin6 authored
fix(moe): 补齐非Marlin量化路径 shared_output/routed_scaling_factor 透传 See merge request dcutoolkit/deeplearing/vllm!529
-
- 24 Mar, 2026 7 commits
- 23 Mar, 2026 4 commits
-
-
wangmin6 authored
删除1d_mrope See merge request dcutoolkit/deeplearing/vllm!525
-
guanyu1 authored
-
wangmin6 authored
关闭sparse_mla的num_head到64/128的pad,以及添加控制fp8_use_mixed_batch模式的环境变量控制,FP8_USE_MI... See merge request dcutoolkit/deeplearing/vllm!524
-
yangql authored
# Conflicts: # vllm/model_executor/layers/sparse_attn_indexer.py # vllm/v1/attention/backends/mla/flashmla_sparse.py
-
- 21 Mar, 2026 5 commits