Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
f2e624d65122db7433bf4fbc74ca1906b7f571b4
Switch branch/tag
vllm_cscc
vllm
11 Aug, 2025
1 commit
已修改 vllm/utils/__init__.py
· f2e624d6
xiabo
authored
Aug 11, 2025
f2e624d6
10 Aug, 2025
2 commits
更改默认的full _cuda_graph启动方式为false
· 3dad13fb
gaoqiong
authored
Aug 10, 2025
3dad13fb
[fix]修复v1 mtp接受率低的问题
· 04b61f0e
王敏
authored
Aug 10, 2025
04b61f0e
09 Aug, 2025
1 commit
Revert "已修改 vllm/utils/__init__.py"
· 9ff3592b
zhuwenwen
authored
Aug 09, 2025
This reverts commit
20b6cf64
.
9ff3592b
08 Aug, 2025
2 commits
update fa full_cuda_graph support
· 513f17a4
zhuwenwen
authored
Aug 08, 2025
513f17a4
feat:新增VLLM_USE_GLOBAL_CACHE13 设置moe使用全局变量的cache13
· 333104ab
jujl1
authored
Jul 31, 2025
333104ab
07 Aug, 2025
6 commits
修改增加SlimQuantW4A8Int8MoEMethod 获取intermediate_size_per_partition 支持
· e92bb9ea
gaoqiong
authored
Aug 07, 2025
e92bb9ea
修改增加lmslimquant_w4a8量化支持
· 8b1e4ef0
gaoqiong
authored
Aug 07, 2025
8b1e4ef0
[feat]支持mtp模型full_cuda_graph
· bd58c289
王敏
authored
Aug 07, 2025
bd58c289
[feat]支持mtp模型full_cuda_graph
· 89eecc55
王敏
authored
Aug 07, 2025
89eecc55
[feat]支持mtp模型full_cuda_graph
· a1239b53
王敏
authored
Aug 07, 2025
a1239b53
已修改 vllm/utils/__init__.py
· 20b6cf64
xiabo
authored
Aug 07, 2025
20b6cf64
06 Aug, 2025
7 commits
update VLLM_FLASH_ATTN_V1 to VLLM_USE_FLASH_ATTN_PA
· 88dbf92c
zhuwenwen
authored
Aug 06, 2025
88dbf92c
update benchmark_throughput.py
· fe657b8b
zhuwenwen
authored
Aug 06, 2025
fe657b8b
update warmup_sampling_params
· 966ebb2b
zhuwenwen
authored
Aug 06, 2025
966ebb2b
[feat]支持mtp模型full_cuda_graph
· 9dd945c1
王敏
authored
Aug 06, 2025
9dd945c1
Revert "Merge remote-tracking branch 'origin/v0.9.2-dev-wm' into v0.9.2-dev"
· 0c1cd0f5
zhuwenwen
authored
Aug 06, 2025
This reverts merge request !169
0c1cd0f5
update lmslim import
· 0d4ff65d
zhuwenwen
authored
Aug 06, 2025
0d4ff65d
Revert "update lmslim import"
· 3ae8665d
zhuwenwen
authored
Aug 06, 2025
This reverts commit
1d575d52
.
3ae8665d
05 Aug, 2025
7 commits
[feat]优化mtp相关函数返回类型
· 7e71c143
王敏
authored
Aug 05, 2025
7e71c143
merge and debug tbo on 0.9.2
· 3f8b2afe
lizhigong
authored
Aug 05, 2025
3f8b2afe
[feat]1.支持mtp模型 full_cuda_graph; 2.优化mtp拒绝采样
· 8e0ae19d
王敏
authored
Aug 05, 2025
8e0ae19d
update lmslim import
· 1d575d52
zhuwenwen
authored
Aug 05, 2025
1d575d52
add glm4.5 k100-ai config
· d160ae26
zhuwenwen
authored
Aug 05, 2025
d160ae26
add step3-vl k100-ai config
· 3e1ed13b
zhuwenwen
authored
Aug 05, 2025
3e1ed13b
when using VLLM_FLASH_ATTN_V1, set block_size to 64
· 80a682c7
zhuwenwen
authored
Aug 05, 2025
80a682c7
04 Aug, 2025
4 commits
add step3-vl config
· 8e1c204b
zhuwenwen
authored
Aug 04, 2025
8e1c204b
add step3-vl tuning
· 2d364c4e
zhuwenwen
authored
Aug 04, 2025
2d364c4e
add tbo on v1 engine
· 20e75ed6
lizhigong
authored
Aug 02, 2025
20e75ed6
update conv layout
· eba84521
zhuwenwen
authored
Aug 04, 2025
eba84521
02 Aug, 2025
1 commit
add glm4.5 config
· 94b06a94
zhuwenwen
authored
Aug 02, 2025
94b06a94
01 Aug, 2025
9 commits
set default block_size to 16
· 80045bf7
zhuwenwen
authored
Aug 01, 2025
80045bf7
update N to N1
· 8c7075d1
zhuwenwen
authored
Aug 01, 2025
8c7075d1
增加w4a8相关支持修改
· 2767fc34
gaoqiong
authored
Aug 01, 2025
2767fc34
back to default conv layout
· 5f18e876
zhuwenwen
authored
Aug 01, 2025
5f18e876
update rocm.py
· 0480314d
zhuwenwen
authored
Aug 01, 2025
0480314d
[Model] Update step3 vl
· 66540380
zhuwenwen
authored
Aug 01, 2025
66540380
[Model] Add step3 vl
· 53ffe40e
zhuwenwen
authored
Aug 01, 2025
53ffe40e
[fix]避免mla中cudagraph的适配影响非并行解码的逻辑
· 0e5d399a
王敏
authored
Aug 01, 2025
0e5d399a
update HIP_VISIBLE_DEVICES of rocm
· d0cc5577
zhuwenwen
authored
Aug 01, 2025
d0cc5577