Commits · a1abfaf334c71d773f927f25c656a595e9b057d4 · OpenDAS / vllm_cscc

09 Feb, 2026 1 commit
- fix: 修复ep的变量未定义 · bee0b4e8
  wujl5 authored Feb 09, 2026
  
  bee0b4e8
05 Feb, 2026 1 commit

feat: Support shared experts fusion. · bbe4df8b

wanglong3 authored Jan 29, 2026

feat: support moe sum when topk==9

bugfix: Fix mtp model load error when eable shared experts fusion.

bbe4df8b

22 Jan, 2026 1 commit
- 优化epsp代码 · 9135afe4
  王敏 authored Jan 22, 2026
  
  9135afe4
21 Jan, 2026 1 commit

perf(qwen3): 融合 q/k RMSNorm + RoPE · 7cd7bf8a

laibao authored Jan 21, 2026

新增 VLLM_USE_FUSED_RMS_ROPE 分支，走 fused 路径
注册 torch.ops.vllm.rms_rotary_embedding_fuse（direct_register_custom_op）
cos_sin_cache 自动转 device/dtype 并缓存，避免每次重复拷贝

7cd7bf8a

16 Jan, 2026 2 commits

add VLLM_USE_FUSED_CACHE_QUANT_BMM_MLA to use fused rmsnorm + contiguous +... · 9dd70f0e

zhuwenwen authored Jan 16, 2026

add VLLM_USE_FUSED_CACHE_QUANT_BMM_MLA to use fused rmsnorm + contiguous + rope(for dpsk-v3) + concat_and_cache_mla + q quant, control bmm(todo) + cat +mla (fp8)

9dd70f0e

MoE 路由抓取：新增 router_capture 工具链与 envs 统一配置 · a2f0ce42

laibao authored Jan 16, 2026

新增环境变量 VLLM_MOE_ROUTER_CAPTURE / DIR / RANK / MAX_LAYERS / NUM_TOKENS_* 用于开关与过滤控制
新增 router_capture.py，支持按 num_tokens 分桶抓取 router logits 并落盘
在 qwen3_moe 中接入抓取逻辑，默认关闭，仅在开启时记录
固定 skip_profile / skip_stack_funcs 为默认启用，避免抓到 warmup/profile 形状
统一配置入口到 vllm.envs，作为运行时基准

a2f0ce42

15 Jan, 2026 2 commits
- Switch default w8a8 gemm impl to blaslt. · 5663e01d
  wanglong3 authored Jan 15, 2026
  
  5663e01d
- 修复deepseek moe模型的awq量化推理bug和精度问题 · 475dcaa0
  yangql authored Jan 15, 2026
  
  475dcaa0
12 Jan, 2026 2 commits
- [feat]添加dp attention功能 · cda54326
  王敏 authored Jan 12, 2026
  
  cda54326
- fix: 修复不开启融合图的断言错误。 · 9cf5c476
  wujl5 authored Jan 12, 2026
  
  9cf5c476
08 Jan, 2026 1 commit
- feat: Support enable rms quant and shared expert overlap at same time. · 989a0a2b
  wanglong3 authored Jan 08, 2026
  
  989a0a2b
07 Jan, 2026 1 commit
- DS量化模型重构atten和moe调用rmsquant融合逻辑。 · 89b62a25
  wujl5 authored Jan 07, 2026
  
  89b62a25
05 Jan, 2026 2 commits
- feat: enable shared expert overlap. · ee19dca6
  wanglong3 authored Jan 03, 2026
  
  ee19dca6
- feat：为 GLM4 和 Llama 模型新增 MultiModalConfigProxy，以支持扁平配置与嵌套的多模态配置（text_config） · 952f0347
  laibao authored Jan 05, 2026
  
  952f0347
04 Jan, 2026 1 commit
- perf: DS-量化模型融合qa和kva的gemm · 577eb49f
  wujl5 authored Jan 04, 2026
  
  577eb49f
23 Dec, 2025 2 commits
- update fuse_fill_rms_x2_concat · bac269d7
  zhuwenwen authored Dec 23, 2025
  
  bac269d7
- add VLLM_USE_FUSED_FILL_RMS_CAT for dpsk mtp fill + rms*2 + cat · e80dcabe
  zhuwenwen authored Dec 23, 2025
```
update VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT impl
```
  e80dcabe
22 Dec, 2025 3 commits
- [fix]1.解决ep sequence parallel优化引入的mtp报错;2.解决共享专家无法和combine overlap问题 · dc01fce4
  王敏 authored Dec 22, 2025
  
  dc01fce4
- [feat]1.优化ep sequence parallel,区分主模型和mtp逻辑;2.ep sequence parallel添加cudagraph... · 62f05dde
  王敏 authored Dec 22, 2025
```
[feat]1.优化ep sequence parallel,区分主模型和mtp逻辑;2.ep sequence parallel添加cudagraph padding到tp_size;3.修复共享专家和deepep combine overlap
```
  62f05dde
- 修复ep的auto模式的崩溃bug · 29523973
  yangql authored Dec 22, 2025
  
  29523973
20 Dec, 2025 1 commit
- update qwen3_moe.py · 4f9947e6
  zhuwenwen authored Dec 20, 2025
  
  4f9947e6
18 Dec, 2025 2 commits
- [feat]合入基于deepep的大EP · 13130b89
  王敏 authored Dec 18, 2025
  
  13130b89
- [feat]优化dp attention，减少1次allgather耗时，高吞吐提升明显 · 3e386c3b
  王敏 authored Dec 18, 2025
  
  3e386c3b
17 Dec, 2025 1 commit
- remove fuse_rmsnorm_rope_quant_gfx938 · 99981972
  zhuwenwen authored Dec 17, 2025
  
  99981972
16 Dec, 2025 2 commits
- add fuse_rmsnorm_rope_quant_gfx938 to support use fp8_e4m3 mla · 0ce3b670
  zhuwenwen authored Dec 16, 2025
  
  0ce3b670
- up auto deepep · 0d3ae2fc
  yangql authored Dec 16, 2025
  
  0d3ae2fc
15 Dec, 2025 1 commit
- [feat]1.支持高吞吐模式ep_scatter+deepgemm contiguous+ep_gather方案；2.支持高吞吐模式下ETP,例如dp4 tp4 · 3833018c
  王敏 authored Dec 15, 2025
  
  3833018c
14 Dec, 2025 1 commit

feat: 为Qwen3 MoE添加RMSNorm和RoPE融合优化与qwen3-480B tp8 moe配置文件 · 6a5443d4

laibao authored Dec 14, 2025

- 新增rms_rotary_embedding_fuse自定义操作
- 添加内核配置文件E=160,N=320
- 通过VLLM_USE_FUSED_RMS_ROPE环境变量控制融合路径

6a5443d4

10 Dec, 2025 1 commit
- [fix]修复deepep 高吞吐模式vmfault问题 · 916b5876
  王敏 authored Dec 10, 2025
  
  916b5876
08 Dec, 2025 1 commit
- [feat]支持deepep ETP，dp4 tp4 ep16相比dp32 tp1 ep32提升明显 · 6cabbf16
  王敏 authored Dec 08, 2025
  
  6cabbf16
04 Dec, 2025 1 commit
- add VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT when use USE_FUSED_RMS_QUANT and... · de87d606
  zhuwenwen authored Dec 04, 2025
```
add VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT when use USE_FUSED_RMS_QUANT and USE_FUSED_CUSTOM_ALL_REDUCE_RMS_QUANT
```
  de87d606
02 Dec, 2025 2 commits
- update weight and cache · fd559b9f
  zhuwenwen authored Dec 02, 2025
  
  fd559b9f
- [feat]支持deepep低延迟与共享专家overlap · 1ae8f58c
  王敏 authored Dec 02, 2025
  
  1ae8f58c
01 Dec, 2025 1 commit
- add VLLM_USE_OPT_RESHAPE_AND_CACHE (test) · 64e307c7
  zhuwenwen authored Dec 01, 2025
  
  64e307c7
26 Nov, 2025 1 commit
- [pref]: DS_v2_w8a8模型融掉moe.quant · 68972532
  wujl5 authored Nov 26, 2025
  
  68972532
20 Nov, 2025 2 commits
- DS_v2_w4a8_CRQ增加slilu_mul_quant支持 · 075841f3
  wujl5 authored Nov 20, 2025
  
  075841f3
- MLA模块CRQ融合分支增加rms_cuda_opt · 8c646ebe
  wujl5 authored Nov 20, 2025
  
  8c646ebe
18 Nov, 2025 1 commit
- deepseekv2-w4a8支持custom-rms-quant融合 · 3e6729e0
  wujl5 authored Nov 18, 2025
  
  3e6729e0
13 Nov, 2025 2 commits
- update qwen3 of rmsnorm · 8375370f
  zhuwenwen authored Nov 13, 2025
  
  8375370f
- [fix]解决moe_fused_gate编译错误，去掉mla中mtp部分的修改 · b956fc64
  zhuwenwen authored Nov 13, 2025
```
restore the default settings of disable_cascade_attn
add VLLM_USE_OPT_ZEROS to replace triton_ (torch.zeros)
set default_max_num_batched_tokens = 10240
update qwen3_moe of layernorm
```
  b956fc64