Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
dc1ad9d1bc55a27865f4d254693ca7d6392b39a1
Switch branch/tag
vllm_cscc
vllm
15 Sep, 2025
2 commits
remove logger.warning
· 5eec6110
zhuwenwen
authored
Sep 15, 2025
5eec6110
update moe_align_block_size
· d251d4f3
zhuwenwen
authored
Sep 15, 2025
d251d4f3
14 Sep, 2025
4 commits
deepseek-r1-w4a8使用rmsquant融合算子及横向融合
· fc443d52
wujl5
authored
Sep 14, 2025
fc443d52
[P/D][Feature]添加pd分离p实例对tbo的支持
· fc0e53fe
xuxzh1
authored
Sep 14, 2025
fc0e53fe
完善释放包括cpu和tensor_id释放
· bfe12894
xuxzh1
authored
Sep 14, 2025
bfe12894
[fix]修复eagle 创建cu_num_tokens类型错误问题
· b4610c06
王敏
authored
Sep 14, 2025
b4610c06
13 Sep, 2025
3 commits
add VLLM_USE_MERGE_ATTN_STATES_OPT to control merge_attn_states support
· 2071c380
zhuwenwen
authored
Sep 13, 2025
2071c380
set VLLM_HAS_CONTEXT_DEFAULT=1
· 48b4c41d
zhuwenwen
authored
Sep 13, 2025
48b4c41d
update triton cat scheduling
· b1329ff2
zhuwenwen
authored
Sep 13, 2025
update the default values of VLLM_USE_TRITON_CAT and VLLM_USE_LIGHT_OP to True
b1329ff2
12 Sep, 2025
1 commit
update the cat implementation of triton's non contiguous memory for the decode phase
· 2c169409
zhuwenwen
authored
Sep 12, 2025
2c169409
11 Sep, 2025
3 commits
新增dpsk-v3.1-awq的支持
· e5f51b79
yangql
authored
Sep 11, 2025
e5f51b79
update fa interface
· bf3d75f4
zhuwenwen
authored
Sep 11, 2025
bf3d75f4
fix: w4a8 marlin 中 weight重排接入lightop算子
· 082d41a1
jujl1
authored
Sep 10, 2025
082d41a1
10 Sep, 2025
8 commits
add rocm merge_attn_states
· f6324f60
zhuwenwen
authored
Sep 10, 2025
f6324f60
update triton kernel to optimize torch cat for ds prefill
· 90e40f49
zhuwenwen
authored
Sep 10, 2025
90e40f49
support cascade_attention
· ead74dfa
zhuwenwen
authored
Sep 10, 2025
ead74dfa
update the conditions for pad_v on v0
· d6dc122f
zhuwenwen
authored
Sep 10, 2025
d6dc122f
update the conditions for pad_v
· aaa89e82
zhuwenwen
authored
Sep 10, 2025
aaa89e82
update flash-attn interface of apply_rotary_emb
· 37707203
zhuwenwen
authored
Sep 10, 2025
37707203
update VLLM_USE_TRITON_CAT during the prefill phase
· 7d959770
zhuwenwen
authored
Sep 10, 2025
7d959770
use VLLM_USE_TRITON_CAT during the prefill phase
· 072d4638
zhuwenwen
authored
Sep 10, 2025
072d4638
09 Sep, 2025
6 commits
update flashmla.py
· ff090f36
zhuwenwen
authored
Sep 09, 2025
ff090f36
import concatv3Tritonfinal
· 5973c805
zhuwenwen
authored
Sep 09, 2025
5973c805
add VLLM_USE_TRITON_CAT to opt torch cat
· 379315a0
zhuwenwen
authored
Sep 09, 2025
379315a0
update qwen2_vl and qwen2_5_vl conv layout
· 71cb74ff
zhuwenwen
authored
Sep 09, 2025
71cb74ff
update flash-attn interface to support keye
· 4d53d14c
zhuwenwen
authored
Sep 09, 2025
4d53d14c
fix: 优化w4a8 marlin 中 weight重排耗时
· ffab74dd
jujl1
authored
Sep 09, 2025
ffab74dd
08 Sep, 2025
1 commit
update op.moe_fused_gate
· bc9aee38
zhuwenwen
authored
Sep 08, 2025
bc9aee38
07 Sep, 2025
2 commits
add VLLM_USE_LIGHT_OP to optimize moe_align_block_size and moe_fused_gate
· a54ab95d
zhuwenwen
authored
Sep 07, 2025
a54ab95d
support w4a8 ep
· 6372a1f3
王敏
authored
Sep 07, 2025
6372a1f3
06 Sep, 2025
6 commits
Update version.py
· 59a345ac
gaoqiong
authored
Sep 06, 2025
59a345ac
回退修改的version.py
· 4e471b8a
gaoqiong
authored
Sep 06, 2025
4e471b8a
[P/D][Feature]显存调度优化,及时释放tensor
· ed392378
zhuwenwen
authored
Sep 06, 2025
ed392378
[fix]fix tests of neuron, quantization etc
· dc2aff4c
zhuwenwen
authored
Sep 06, 2025
dc2aff4c
fix precision issue in mtp
· 2a4a2877
lizhigong
authored
Sep 05, 2025
2a4a2877
增加reduce修改
· 863176e5
SAC_fanth
authored
Sep 06, 2025
863176e5
04 Sep, 2025
4 commits
fix performance issues caused by enabling TBO
· 424cccfe
lizhigong
authored
Sep 04, 2025
424cccfe
fix bugs in zero overhead and tbo
· f6f8db81
lizhigong
authored
Sep 03, 2025
f6f8db81
暂时去掉profilling标志位,避免影响其他模型
· 8d971060
王敏
authored
Sep 04, 2025
8d971060
add glm-4.5-air-tp8 nn moe configs
· 7a97637e
zhuwenwen
authored
Sep 04, 2025
7a97637e