Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
c57bb199
Unverified
Commit
c57bb199
authored
Jun 12, 2025
by
Russell Bryant
Committed by
GitHub
Jun 12, 2025
Browse files
[V1] Resolve failed concurrent structured output requests (#19565)
Signed-off-by:
Russell Bryant
<
rbryant@redhat.com
>
parent
dba68f91
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
1 deletion
+8
-1
vllm/v1/worker/gpu_model_runner.py
vllm/v1/worker/gpu_model_runner.py
+8
-1
No files found.
vllm/v1/worker/gpu_model_runner.py
View file @
c57bb199
...
...
@@ -66,11 +66,15 @@ from .utils import (gather_mm_placeholders, initialize_kv_cache_for_kv_sharing,
if
TYPE_CHECKING
:
import
xgrammar
as
xgr
import
xgrammar.kernels.apply_token_bitmask_inplace_torch_compile
as
xgr_torch_compile
# noqa: E501
from
vllm.model_executor.model_loader.tensorizer
import
TensorizerConfig
from
vllm.v1.core.sched.output
import
SchedulerOutput
else
:
xgr
=
LazyLoader
(
"xgr"
,
globals
(),
"xgrammar"
)
xgr_torch_compile
=
LazyLoader
(
"xgr_torch_compile"
,
globals
(),
"xgrammar.kernels.apply_token_bitmask_inplace_torch_compile"
)
logger
=
init_logger
(
__name__
)
...
...
@@ -1103,7 +1107,10 @@ class GPUModelRunner(LoRAModelRunnerMixin):
# so we receive it in that format.
grammar_bitmask
=
torch
.
from_numpy
(
grammar_bitmask
)
xgr
.
apply_token_bitmask_inplace
(
# Force use of the torch.compile implementation from xgrammar to work
# around issues with the Triton kernel in concurrent structured output
# scenarios. See PR #19565 and issues #19493, #18376 for details.
xgr_torch_compile
.
apply_token_bitmask_inplace_torch_compile
(
logits
,
grammar_bitmask
.
to
(
self
.
device
,
non_blocking
=
True
),
indices
=
out_indices
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment