Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
e66d787b
Unverified
Commit
e66d787b
authored
Oct 14, 2025
by
Michael Goin
Committed by
GitHub
Oct 15, 2025
Browse files
Disable FlashInfer sampler by default (#26859)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
bfad142e
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
14 deletions
+6
-14
vllm/v1/sample/ops/topk_topp_sampler.py
vllm/v1/sample/ops/topk_topp_sampler.py
+6
-14
No files found.
vllm/v1/sample/ops/topk_topp_sampler.py
View file @
e66d787b
...
@@ -46,23 +46,15 @@ class TopKTopPSampler(nn.Module):
...
@@ -46,23 +46,15 @@ class TopKTopPSampler(nn.Module):
"Falling back to default sampling implementation."
"Falling back to default sampling implementation."
)
)
self
.
forward
=
self
.
forward_native
self
.
forward
=
self
.
forward_native
elif
envs
.
VLLM_USE_FLASHINFER_SAMPLER
is
not
False
:
elif
envs
.
VLLM_USE_FLASHINFER_SAMPLER
:
# NOTE(woosuk): The V0 sampler doesn't use FlashInfer for
# Users must opt in explicitly via VLLM_USE_FLASHINFER_SAMPLER=1.
# sampling unless VLLM_USE_FLASHINFER_SAMPLER=1 (i.e., by
# default it is unused). For backward compatibility, we set
# `VLLM_USE_FLASHINFER_SAMPLER` as None by default and
# interpret it differently in V0 and V1 samplers: In V0,
# None means False, while in V1, None means True. This is
# why we use the condition
# `envs.VLLM_USE_FLASHINFER_SAMPLER is not False` here.
logger
.
info_once
(
"Using FlashInfer for top-p & top-k sampling."
)
logger
.
info_once
(
"Using FlashInfer for top-p & top-k sampling."
)
self
.
forward
=
self
.
forward_cuda
self
.
forward
=
self
.
forward_cuda
else
:
else
:
logger
.
warning_once
(
logger
.
debug_once
(
"FlashInfer is available, but it is not enabled. "
"FlashInfer top-p/top-k sampling is available but disabled "
"Falling back to the PyTorch-native implementation of "
"by default. Set VLLM_USE_FLASHINFER_SAMPLER=1 to opt in "
"top-p & top-k sampling. For the best performance, "
"after verifying accuracy for your workloads."
"please set VLLM_USE_FLASHINFER_SAMPLER=1."
)
)
self
.
forward
=
self
.
forward_native
self
.
forward
=
self
.
forward_native
else
:
else
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment