Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
5c54d975
Unverified
Commit
5c54d975
authored
Aug 01, 2025
by
Abirdcfly
Committed by
GitHub
Aug 01, 2025
Browse files
[Bugfix][PD] set max_completion_tokens=1 if req has this value (#21841)
Signed-off-by:
Abirdcfly
<
fp544037857@gmail.com
>
parent
0a6d305e
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
0 deletions
+4
-0
examples/online_serving/disaggregated_serving/disagg_proxy_demo.py
...online_serving/disaggregated_serving/disagg_proxy_demo.py
+2
-0
examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_proxy_p2p_nccl_xpyd.py
...gated_serving_p2p_nccl_xpyd/disagg_proxy_p2p_nccl_xpyd.py
+2
-0
No files found.
examples/online_serving/disaggregated_serving/disagg_proxy_demo.py
View file @
5c54d975
...
...
@@ -293,6 +293,8 @@ class Proxy:
# add params to request
kv_prepare_request
=
request
.
copy
()
kv_prepare_request
[
"max_tokens"
]
=
1
if
"max_completion_tokens"
in
kv_prepare_request
:
kv_prepare_request
[
"max_completion_tokens"
]
=
1
# prefill stage
prefill_instance
=
self
.
schedule
(
self
.
prefill_cycler
)
...
...
examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_proxy_p2p_nccl_xpyd.py
View file @
5c54d975
...
...
@@ -128,6 +128,8 @@ async def handle_request():
prefill_request
=
original_request_data
.
copy
()
# change max_tokens = 1 to let it only do prefill
prefill_request
[
"max_tokens"
]
=
1
if
"max_completion_tokens"
in
prefill_request
:
prefill_request
[
"max_completion_tokens"
]
=
1
global
count
global
prefill_instances
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment