Commits · dc1ad9d1bc55a27865f4d254693ca7d6392b39a1 · OpenDAS / vllm_cscc

15 Sep, 2025 2 commits
- remove logger.warning · 5eec6110
  zhuwenwen authored Sep 15, 2025
  
  5eec6110
- update moe_align_block_size · d251d4f3
  zhuwenwen authored Sep 15, 2025
  
  d251d4f3
14 Sep, 2025 4 commits
- deepseek-r1-w4a8使用rmsquant融合算子及横向融合 · fc443d52
  wujl5 authored Sep 14, 2025
  
  fc443d52
- [P/D][Feature]添加pd分离p实例对tbo的支持 · fc0e53fe
  xuxzh1 authored Sep 14, 2025
  
  fc0e53fe
- 完善释放包括cpu和tensor_id释放 · bfe12894
  xuxzh1 authored Sep 14, 2025
  
  bfe12894
- [fix]修复eagle 创建cu_num_tokens类型错误问题 · b4610c06
  王敏 authored Sep 14, 2025
  
  b4610c06
13 Sep, 2025 3 commits
- add VLLM_USE_MERGE_ATTN_STATES_OPT to control merge_attn_states support · 2071c380
  zhuwenwen authored Sep 13, 2025
  
  2071c380
- set VLLM_HAS_CONTEXT_DEFAULT=1 · 48b4c41d
  zhuwenwen authored Sep 13, 2025
  
  48b4c41d
- update triton cat scheduling · b1329ff2
  zhuwenwen authored Sep 13, 2025
```
update the default values of VLLM_USE_TRITON_CAT and VLLM_USE_LIGHT_OP to True
```
  b1329ff2
12 Sep, 2025 1 commit
- update the cat implementation of triton's non contiguous memory for the decode phase · 2c169409
  zhuwenwen authored Sep 12, 2025
  
  2c169409
11 Sep, 2025 3 commits
- 新增dpsk-v3.1-awq的支持 · e5f51b79
  yangql authored Sep 11, 2025
  
  e5f51b79
- update fa interface · bf3d75f4
  zhuwenwen authored Sep 11, 2025
  
  bf3d75f4
- fix: w4a8 marlin 中 weight重排接入lightop算子 · 082d41a1
  jujl1 authored Sep 10, 2025
  
  082d41a1
10 Sep, 2025 8 commits
- add rocm merge_attn_states · f6324f60
  zhuwenwen authored Sep 10, 2025
  
  f6324f60
- update triton kernel to optimize torch cat for ds prefill · 90e40f49
  zhuwenwen authored Sep 10, 2025
  
  90e40f49
- support cascade_attention · ead74dfa
  zhuwenwen authored Sep 10, 2025
  
  ead74dfa
- update the conditions for pad_v on v0 · d6dc122f
  zhuwenwen authored Sep 10, 2025
  
  d6dc122f
- update the conditions for pad_v · aaa89e82
  zhuwenwen authored Sep 10, 2025
  
  aaa89e82
- update flash-attn interface of apply_rotary_emb · 37707203
  zhuwenwen authored Sep 10, 2025
  
  37707203
- update VLLM_USE_TRITON_CAT during the prefill phase · 7d959770
  zhuwenwen authored Sep 10, 2025
  
  7d959770
- use VLLM_USE_TRITON_CAT during the prefill phase · 072d4638
  zhuwenwen authored Sep 10, 2025
  
  072d4638
09 Sep, 2025 6 commits
- update flashmla.py · ff090f36
  zhuwenwen authored Sep 09, 2025
  
  ff090f36
- import concatv3Tritonfinal · 5973c805
  zhuwenwen authored Sep 09, 2025
  
  5973c805
- add VLLM_USE_TRITON_CAT to opt torch cat · 379315a0
  zhuwenwen authored Sep 09, 2025
  
  379315a0
- update qwen2_vl and qwen2_5_vl conv layout · 71cb74ff
  zhuwenwen authored Sep 09, 2025
  
  71cb74ff
- update flash-attn interface to support keye · 4d53d14c
  zhuwenwen authored Sep 09, 2025
  
  4d53d14c
- fix: 优化w4a8 marlin 中 weight重排耗时 · ffab74dd
  jujl1 authored Sep 09, 2025
  
  ffab74dd
08 Sep, 2025 1 commit
- update op.moe_fused_gate · bc9aee38
  zhuwenwen authored Sep 08, 2025
  
  bc9aee38
07 Sep, 2025 2 commits
- add VLLM_USE_LIGHT_OP to optimize moe_align_block_size and moe_fused_gate · a54ab95d
  zhuwenwen authored Sep 07, 2025
  
  a54ab95d
- support w4a8 ep · 6372a1f3
  王敏 authored Sep 07, 2025
  
  6372a1f3
06 Sep, 2025 6 commits
- Update version.py · 59a345ac
  gaoqiong authored Sep 06, 2025
  
  59a345ac
- 回退修改的version.py · 4e471b8a
  gaoqiong authored Sep 06, 2025
  
  4e471b8a
- [P/D][Feature]显存调度优化，及时释放tensor · ed392378
  zhuwenwen authored Sep 06, 2025
  
  ed392378
- [fix]fix tests of neuron, quantization etc · dc2aff4c
  zhuwenwen authored Sep 06, 2025
  
  dc2aff4c
- fix precision issue in mtp · 2a4a2877
  lizhigong authored Sep 05, 2025
  
  2a4a2877
- 增加reduce修改 · 863176e5
  SAC_fanth authored Sep 06, 2025
  
  863176e5
04 Sep, 2025 4 commits
- fix performance issues caused by enabling TBO · 424cccfe
  lizhigong authored Sep 04, 2025
  
  424cccfe
- fix bugs in zero overhead and tbo · f6f8db81
  lizhigong authored Sep 03, 2025
  
  f6f8db81
- 暂时去掉profilling标志位，避免影响其他模型 · 8d971060
  王敏 authored Sep 04, 2025
  
  8d971060
- add glm-4.5-air-tp8 nn moe configs · 7a97637e
  zhuwenwen authored Sep 04, 2025
  
  7a97637e