Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
c50f084a
Commit
c50f084a
authored
Nov 21, 2025
by
jujl1
Browse files
feat: pp mtp加入零消耗调度,加入环境变量VLLM_USE_ZERO_MTP,默认打开
parent
d126ce21
Changes
2
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
659 additions
and
9 deletions
+659
-9
vllm/envs.py
vllm/envs.py
+13
-8
vllm/v1/worker/gpu_model_runner.py
vllm/v1/worker/gpu_model_runner.py
+646
-1
No files found.
vllm/envs.py
View file @
c50f084a
...
...
@@ -182,6 +182,7 @@ if TYPE_CHECKING:
VLLM_USE_LIGHTOP_FILL_MOE_ALIGN
:
bool
=
False
USE_FUSED_CUSTOM_ALL_REDUCE_RMS_QUANT
:
bool
=
False
VLLM_USE_PP_BALANCE
:
bool
=
False
VLLM_USE_ZERO_MTP
:
bool
=
False
VLLM_USE_CUDA_GRAPH_SIZES
:
bool
=
False
def
get_default_cache_root
():
...
...
@@ -1186,6 +1187,10 @@ environment_variables: dict[str, Callable[[], Any]] = {
lambda
:
(
os
.
getenv
(
'VLLM_USE_PP_BALANCE'
,
'1'
).
lower
()
in
(
"true"
,
"1"
)),
"VLLM_USE_ZERO_MTP"
:
lambda
:
(
os
.
getenv
(
'VLLM_USE_ZERO_MTP'
,
'1'
).
lower
()
in
(
"true"
,
"1"
)),
# vllm will use 1-18... (not only 1 2 4 8 16)
"VLLM_USE_CUDA_GRAPH_SIZES"
:
lambda
:
(
os
.
getenv
(
'VLLM_USE_CUDA_GRAPH_SIZES'
,
'False'
).
lower
()
in
...
...
vllm/v1/worker/gpu_model_runner.py
View file @
c50f084a
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment