Commits · 94823af173df7d2e49de07cbb077e277438de5e0 · OpenDAS / vllm_cscc

14 Apr, 2026 3 commits
- Revert "feat:新增step3.5-mtp3功能" · 94823af1
  laibao authored Apr 14, 2026
```
This reverts commit a1f4d869.
```
  94823af1
- feat:新增step3.5-mtp3功能 · a1f4d869
  laibao authored Apr 11, 2026
  
  a1f4d869
- [BUGFIX] 修复 compressed tensors FP8 MoE 路径未透传 i_q/i_s 参数的问题 · feadffce
  laibao authored Apr 14, 2026
  
  feadffce
10 Apr, 2026 3 commits
- [FEATURE] 接入 LightOP 的 silu_and_mul 自定义算子并统一 OPT 路径 · 824dde97
  laibao authored Apr 10, 2026
```
在 vllm/model_executor/layers/activation.py 中调整 SiluAndMul.forward_cuda：当 VLLM_USE_OPT_OP=1 时统一走 ops.silu_and_mul_opt_lightop(x)
在 vllm/_custom_ops.py 中新增并注册 silu_and_mul_opt_lightop（含 fake_impl），用于编译/非编译路径统一调用
```
  824dde97
- 更新 vllm/model_executor/layers/layernorm.py, vllm/_custom_ops.py · ce47a56e
  yangyn authored Apr 03, 2026
  
  ce47a56e
- fused_add_rms_norm use lightop · 15883da4
  fanwl authored Apr 02, 2026
  
  15883da4
08 Apr, 2026 2 commits
- [FEATURE] DuSwiftConnector support glm5 model PD （attention sparse_attn_indexer layer_name change ） · bcb2ba6c
  xiabo authored Apr 08, 2026
  
  bcb2ba6c
- [BUGFIX] rms_quant融合功能适配DSA · a05d749e
  wujl5 authored Apr 08, 2026
  
  a05d749e
01 Apr, 2026 1 commit
- [BUGFIX] 修复 fused MoE modular kernel 路径中 shared_output 和 routed_scaling_factor 透传不完整的问题 · b281794e
  laibao authored Apr 01, 2026
  
  b281794e
27 Mar, 2026 2 commits
- fix get config assert error · ca158ae9
  flyingdown authored Mar 27, 2026
  
  ca158ae9
- use tunning w4a16 moe · 6adf9d12
  flyingdown authored Mar 27, 2026
  
  6adf9d12
26 Mar, 2026 2 commits
- per_token_group_quant_fp8 opt · a0ac95b0
  wanghl6 authored Mar 26, 2026
  
  a0ac95b0
- topk opt · cb68935c
  wanghl6 authored Mar 26, 2026
  
  cb68935c
24 Mar, 2026 4 commits
- 支持kvacache fp8_e4m3/fp8_e5m2 · 442abc67
  xiabo authored Mar 24, 2026
```
支持kvacache fp8_e4m3/fp8_e5m2的RMS_ROPE_CONCAT
```
  442abc67
- fix(moe): 补齐非Marlin量化路径 shared_output/routed_scaling_factor 透传 · 6ef5d322
  laibao authored Mar 24, 2026
  
  6ef5d322
- 处理VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD导致的awq推理bug问题 · beae085a
  yangql authored Mar 24, 2026
  
  beae085a
- 支持kvacache fp8_e4m3/fp8_e5m2 · 0e5a20b3
  xiabo authored Mar 24, 2026
```
支持kvacache fp8_e4m3/fp8_e5m2的RMS_ROPE_CONCAT
```
  0e5a20b3
21 Mar, 2026 4 commits
- 修复get_gcn_arch_name的导入bug · 53889c8b
  yangql authored Mar 21, 2026
  
  53889c8b
- 修复get_gcn_arch_name的导入bug · 7c8db5e7
  yangql authored Mar 21, 2026
  
  7c8db5e7
- 增加triton的indexer的kcahche读写操作 · 656944ac
  yangql authored Mar 21, 2026
  
  656944ac
- [perf]DSA架构模型支持mtp>1 · 7eb2446c
  王敏 authored Mar 21, 2026
  
  7eb2446c
20 Mar, 2026 1 commit
- fix(moe): 仅在 fused moe_sum+mul+add 开启时透传 shared_output · 839dc88e
  laibao authored Mar 20, 2026
  
  839dc88e
19 Mar, 2026 1 commit

feat(moe): 修复 shared_output 透传被覆盖并兼容 torch.compile 启动路径 · eb933fe1

laibao authored Mar 19, 2026

移除 forward 中对 experts.use_overlapped/_shared_experts 的状态改写，避免 torch.compile 启动期 shared/non-shared 路径不一致
FusedMoE.forward_impl 仅在 shared_output 为空时计算 shared experts，防止透传值被本地重算覆盖

eb933fe1

18 Mar, 2026 3 commits

feat(moe): 增加 LightOP moe_sum+mul+add 融合并打通参数透传 · 0639678c

laibao authored Mar 18, 2026

新增环境变量 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 用于控制
fused sum+mul+add 开关。
在 DeepseekV2MoE 中增加 fused 路径，预计算 shared_output，并下传 iqis 与 routed_scaling_factor。
扩展 FusedMoE/SharedFusedMoE 及相关 custom op 接口，统一透传 i_q/i_s/shared_output/routed_scaling_factor。
同步适配 Triton、Marlin W16A16、SlimQuant W4A8、CompressedTensors W8A8 等实现，支持在内核侧完成 sum+mul+add。

0639678c

feat:支持mqa的fp8实现 · b5323d90
lixh6 authored Mar 18, 2026

b5323d90
x接入mla_cat算子仅在nmz和kvcache-fp8情况下生效，默认关闭，开启需要export VLLM_USE_CAT_MLA=1 · 3bff7958
yangql authored Mar 18, 2026

3bff7958

16 Mar, 2026 2 commits
- fix: resolve block_shape conflicts between DeepEP MoE and non-DeepEP quantization · f9a04c97
  chenhw5 authored Mar 16, 2026
  
  f9a04c97
- perf: GLM4.7增加MOE调用rmsQuant, fix: 修掉fused_moe向后传递None导致的报错 · 0f6b9a19
  wujl5 authored Mar 16, 2026
  
  0f6b9a19
13 Mar, 2026 3 commits
- rms_norm_opt精度问题解决 · 9404668a
  guanyu1 authored Mar 13, 2026
  
  9404668a
- fix: 修复MOE量化tensor对于其他模型的影响 · 8e726b3f
  wujl5 authored Mar 13, 2026
  
  8e726b3f
- 修改sparse_attn hip后端 · 7cec75a7
  liuchy5 authored Mar 13, 2026
  
  7cec75a7
12 Mar, 2026 8 commits
- Fix：GLM-5量化模型mla_attention layout修复&&sparse_attn fp8支持 · 5b9ad722
  lixh6 authored Mar 12, 2026
  
  5b9ad722
- feat(deepseek-mla):: 精简 fused RMS-RoPE concat 可用性判断 · 9aabf7e7
  laibao authored Mar 12, 2026
  
  9aabf7e7
- feat(deepseek-mla): 接入 VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT 融合链路 · cae53e46
  laibao authored Mar 10, 2026
```
新增环境变量与 MLA 融合接线（wrapper -> attention -> impl） 接入 lightop fused_rms_norm_rope_contiguous，保留回退路径
```
  cae53e46
- moe: 补齐 fill+moe_align 融合开关语义 · 706c031c
  laibao authored Mar 09, 2026
  
  706c031c
- perf: DS V2模型MOE部分增加rmsQuant · 168ceef7
  wujl5 authored Mar 12, 2026
  
  168ceef7
- fix: fix bug http://hpczentao.sugon.com/bug-view-118388.html · 8bf99b0b
  wanglong3 authored Mar 12, 2026
  
  8bf99b0b
- perf: DS V2模型MLA中增加rmsQuant · 2350c778
  wujl5 authored Mar 12, 2026
  
  2350c778
- perf: DS v2增加DTBMM融合,默认关闭 · 6ca1362b
  wujl5 authored Mar 12, 2026
  
  6ca1362b
11 Mar, 2026 1 commit
- feat:修复dsa的mqa接口兼容glm5 · 66979358
  liuchy5 authored Mar 11, 2026
  
  66979358