Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
2ad10292
Unverified
Commit
2ad10292
authored
Apr 14, 2026
by
Wentao Ye
Committed by
GitHub
Apr 14, 2026
Browse files
[Bug] Fix batch invariance nvfp4 support (#39820)
Signed-off-by:
yewentao256
<
zhyanwentao@126.com
>
parent
b2f749dc
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
8 additions
and
1 deletion
+8
-1
.buildkite/test_areas/misc.yaml
.buildkite/test_areas/misc.yaml
+1
-0
vllm/model_executor/kernels/linear/__init__.py
vllm/model_executor/kernels/linear/__init__.py
+7
-1
No files found.
.buildkite/test_areas/misc.yaml
View file @
2ad10292
...
@@ -224,6 +224,7 @@ steps:
...
@@ -224,6 +224,7 @@ steps:
-
pytest -v -s v1/determinism/test_rms_norm_batch_invariant.py
-
pytest -v -s v1/determinism/test_rms_norm_batch_invariant.py
-
VLLM_TEST_MODEL=deepseek-ai/DeepSeek-V2-Lite-Chat pytest -v -s v1/determinism/test_batch_invariance.py::test_v1_generation_is_deterministic_across_batch_sizes_with_needle[TRITON_MLA]
-
VLLM_TEST_MODEL=deepseek-ai/DeepSeek-V2-Lite-Chat pytest -v -s v1/determinism/test_batch_invariance.py::test_v1_generation_is_deterministic_across_batch_sizes_with_needle[TRITON_MLA]
-
VLLM_TEST_MODEL=Qwen/Qwen3-30B-A3B-Thinking-2507-FP8 pytest -v -s v1/determinism/test_batch_invariance.py::test_v1_generation_is_deterministic_across_batch_sizes_with_needle[FLASH_ATTN]
-
VLLM_TEST_MODEL=Qwen/Qwen3-30B-A3B-Thinking-2507-FP8 pytest -v -s v1/determinism/test_batch_invariance.py::test_v1_generation_is_deterministic_across_batch_sizes_with_needle[FLASH_ATTN]
-
pytest -v -s v1/determinism/test_nvfp4_batch_invariant.py
-
label
:
Acceptance Length Test (Large Models)
# optional
-
label
:
Acceptance Length Test (Large Models)
# optional
timeout_in_minutes
:
25
timeout_in_minutes
:
25
...
...
vllm/model_executor/kernels/linear/__init__.py
View file @
2ad10292
...
@@ -601,7 +601,13 @@ def init_nvfp4_linear_kernel() -> NvFp4LinearKernel:
...
@@ -601,7 +601,13 @@ def init_nvfp4_linear_kernel() -> NvFp4LinearKernel:
# Env-var overrides.
# Env-var overrides.
force_kernel
:
type
[
NvFp4LinearKernel
]
|
None
=
None
force_kernel
:
type
[
NvFp4LinearKernel
]
|
None
=
None
if
envs
.
VLLM_USE_FBGEMM
:
if
envs
.
VLLM_BATCH_INVARIANT
:
logger
.
info_once
(
"VLLM_BATCH_INVARIANT forces NVFP4 linear to use the "
"emulation backend for deterministic execution."
)
force_kernel
=
EmulationNvFp4LinearKernel
elif
envs
.
VLLM_USE_FBGEMM
:
force_kernel
=
FbgemmNvFp4LinearKernel
force_kernel
=
FbgemmNvFp4LinearKernel
elif
envs
.
VLLM_USE_NVFP4_CT_EMULATIONS
:
elif
envs
.
VLLM_USE_NVFP4_CT_EMULATIONS
:
force_kernel
=
EmulationNvFp4LinearKernel
force_kernel
=
EmulationNvFp4LinearKernel
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment