Commits · b2db7ca2cf032009c3b629b1c09ccc836e7ba3de · OpenDAS / vllm_cscc

15 Dec, 2025 1 commit
- [feat]1.支持高吞吐模式ep_scatter+deepgemm contiguous+ep_gather方案；2.支持高吞吐模式下ETP,例如dp4 tp4 · 3833018c
  王敏 authored Dec 15, 2025
  
  3833018c
10 Dec, 2025 1 commit
- [fix]修复deepep 高吞吐模式vmfault问题 · 916b5876
  王敏 authored Dec 10, 2025
  
  916b5876
08 Dec, 2025 1 commit
- [feat]支持deepep ETP，dp4 tp4 ep16相比dp32 tp1 ep32提升明显 · 6cabbf16
  王敏 authored Dec 08, 2025
  
  6cabbf16
02 Dec, 2025 1 commit
- [feat]支持deepep低延迟与共享专家overlap · 1ae8f58c
  王敏 authored Dec 02, 2025
  
  1ae8f58c
13 Nov, 2025 1 commit

[fix]解决moe_fused_gate编译错误，去掉mla中mtp部分的修改 · b956fc64

zhuwenwen authored Nov 13, 2025

restore the default settings of disable_cascade_attn
add VLLM_USE_OPT_ZEROS to replace triton_ (torch.zeros)
set default_max_num_batched_tokens = 10240
update qwen3_moe of layernorm

b956fc64

07 Nov, 2025 1 commit
- add contiguous+rmsnorm to replace triton_ · dc54fefe
  zhuwenwen authored Nov 07, 2025
  
  dc54fefe
03 Nov, 2025 1 commit
- use apply_rotary_emb_torch for z100l&k100 · a3695a2b
  zhuwenwen authored Nov 03, 2025
  
  a3695a2b
01 Nov, 2025 1 commit
- [feat]整合mori和deepep相关代码 · d698d6f2
  王敏 authored Nov 01, 2025
  
  d698d6f2
29 Oct, 2025 1 commit
- remove redundant envs · c26cfd1a
  zhuwenwen authored Oct 29, 2025
  
  c26cfd1a
27 Oct, 2025 1 commit
- [feat]w4a8适配deepep ht模式，解决开启dp时mtp>1时卡住问题 · 98b7432a
  王敏 authored Oct 27, 2025
  
  98b7432a
24 Oct, 2025 1 commit

add VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD · c2e6f453

zhuwenwen authored Oct 24, 2025

support prefix cache on kme
fix the error in test_moe caused by moe align not supporting 511 and 211
multi-modal switching to torch implementation on z100l&k100

c2e6f453

15 Oct, 2025 4 commits
- update deepseek_v2.py · 4b3e2d5e
  zhuwenwen authored Oct 15, 2025
  
  4b3e2d5e
- update deepseek_v2.py · 4ae3fc04
  zhuwenwen authored Oct 15, 2025
  
  4ae3fc04
- 删除DPSK_FP16_QUICK，以及增加awq和blockwiseint8的shared_output接口 · 50cb9270
  yangql authored Oct 15, 2025
  
  50cb9270
- 删除DPSK_FP16_QUICK，以及增加awq和blockwiseint8的shared_output接口 · 7f459b46
  yangql authored Oct 15, 2025
  
  7f459b46
13 Oct, 2025 2 commits
- 去掉all2all ep相关代码 · 0b467604
  王敏 authored Oct 13, 2025
  
  0b467604
- support dsv32 · 633f8199
  zhuwenwen authored Oct 13, 2025
  
  633f8199
10 Oct, 2025 1 commit
- fix deepseek_v2.py · 2e1d8e9a
  zhuwenwen authored Oct 10, 2025
  
  2e1d8e9a
30 Sep, 2025 3 commits
- 修复部分代码 · e0ba23b5
  王敏 authored Sep 30, 2025
  
  e0ba23b5
- 修复部分代码 · 46e26bf1
  王敏 authored Sep 30, 2025
  
  46e26bf1
- [feat]优化mori计算逻辑，支持cudagraph，按照bs*ep_size截断fused_moe的输入，共享专家不tp切分，去掉最后的allreduce · d2e57a90
  王敏 authored Sep 30, 2025
  
  d2e57a90
25 Sep, 2025 1 commit
- [kernels] add fused_rms_norm_contiguous and rotary_embedding_deepseek_fuse · 49810c37
  zhuwenwen authored Sep 25, 2025
```
[kernels] update moe_align_block_size and moe_sum interface
```
  49810c37
24 Sep, 2025 1 commit
- [kernel] add lightop's moe_sum(mul+add) fusion operator for deepseek · 8d2cac26
  zhuwenwen authored Sep 24, 2025
```
[FIX] 修复mtp和VLLM_USE_TRITON_CAT不能一起开的bug
```
  8d2cac26
22 Sep, 2025 1 commit
- deepseek-r1-w4a8 mlp/moe调用silu-mul-quant融合 · 3964e019
  wujl5 authored Sep 22, 2025
  
  3964e019
18 Sep, 2025 1 commit
- [kernel] add VLLM_USE_DEEPSEEK_MOE_SUM_MUL_AND to use lightop's moe_sum fusion... · 0975d9e8
  zhuwenwen authored Sep 18, 2025
```
[kernel] add VLLM_USE_DEEPSEEK_MOE_SUM_MUL_AND to use lightop's moe_sum fusion operator for deepseek
```
  0975d9e8
14 Sep, 2025 1 commit
- deepseek-r1-w4a8使用rmsquant融合算子及横向融合 · fc443d52
  wujl5 authored Sep 14, 2025
  
  fc443d52
10 Sep, 2025 1 commit
- update flash-attn interface of apply_rotary_emb · 37707203
  zhuwenwen authored Sep 10, 2025
  
  37707203
09 Sep, 2025 2 commits
- update qwen2_vl and qwen2_5_vl conv layout · 71cb74ff
  zhuwenwen authored Sep 09, 2025
  
  71cb74ff
- update flash-attn interface to support keye · 4d53d14c
  zhuwenwen authored Sep 09, 2025
  
  4d53d14c
04 Sep, 2025 1 commit
- 1.优化大EP，合入grouped gemm · d997afc4
  王敏 authored Sep 04, 2025
```
2.解决mtp >1 大EP推理all gather卡住问题
```
  d997afc4
01 Sep, 2025 2 commits
- [feat]添加all2all ep · ffa325e0
  王敏 authored Sep 01, 2025
  
  ffa325e0
- [BugFix]支持v1 engine pp deepseek · 4a946680
  zhuwenwen authored Sep 01, 2025
  
  4a946680
29 Aug, 2025 1 commit
- 修复residual FP16 overflow，解决mtp采样率和数据集精度的冲突 · 080ed180
  yangql authored Aug 29, 2025
  
  080ed180
28 Aug, 2025 1 commit
- [feat]上传初版基于all2all通信的大EP代码 · d04683a4
  王敏 authored Aug 28, 2025
  
  d04683a4
25 Aug, 2025 1 commit
- 临时上传大ep代码 · dbd0bda6
  王敏 authored Aug 25, 2025
  
  dbd0bda6
15 Aug, 2025 1 commit
- [fix]修复mtp eager模式下显存占用增加问题 · 49559d79
  王敏 authored Aug 15, 2025
  
  49559d79
07 Aug, 2025 1 commit
- [feat]支持mtp模型full_cuda_graph · 89eecc55
  王敏 authored Aug 07, 2025
  
  89eecc55
06 Aug, 2025 1 commit
- Revert "Merge remote-tracking branch 'origin/v0.9.2-dev-wm' into v0.9.2-dev" · 0c1cd0f5
  zhuwenwen authored Aug 06, 2025
```
This reverts merge request !169
```
  0c1cd0f5
05 Aug, 2025 1 commit
- [feat]1.支持mtp模型 full_cuda_graph; 2.优化mtp拒绝采样 · 8e0ae19d
  王敏 authored Aug 05, 2025
  
  8e0ae19d
04 Aug, 2025 1 commit
- update conv layout · eba84521
  zhuwenwen authored Aug 04, 2025
  
  eba84521