Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
d7963752
Unverified
Commit
d7963752
authored
Oct 16, 2025
by
XiaobingZhang
Committed by
GitHub
Oct 15, 2025
Browse files
[ModelOpt] Remove NVFP4 MoE K%16==0 constraint (#26891)
Signed-off-by:
XiaobingSuper
<
xiaobingzhangupc@gmail.com
>
parent
14f84563
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
0 additions
and
12 deletions
+0
-12
vllm/model_executor/layers/quantization/modelopt.py
vllm/model_executor/layers/quantization/modelopt.py
+0
-12
No files found.
vllm/model_executor/layers/quantization/modelopt.py
View file @
d7963752
...
...
@@ -1542,23 +1542,11 @@ class ModelOptNvFp4FusedMoE(FusedMoEMethodBase):
del
layer
.
w2_input_scale_quant
else
:
# Non-TRT-LLM processing (Cutlass or non-flashinfer)
assert
layer
.
w13_weight_scale
.
shape
[
2
]
%
16
==
0
,
(
"Expected weight_scale.dim(1) to be divisible by 16"
)
assert
layer
.
w13_weight_scale
.
dtype
==
torch
.
float8_e4m3fn
,
(
"Weight Blockscale must be represented as FP8-E4M3"
)
w13_blockscale_swizzled
=
swizzle_blockscale
(
layer
.
w13_weight_scale
)
layer
.
w13_weight_scale
=
Parameter
(
w13_blockscale_swizzled
,
requires_grad
=
False
)
assert
layer
.
w2_weight_scale
.
shape
[
2
]
%
16
==
0
,
(
"Expected weight_scale.dim(1) to be divisible by 16"
)
assert
layer
.
w2_weight_scale
.
dtype
==
torch
.
float8_e4m3fn
,
(
"Weight Blockscale must be represented as FP8-E4M3"
)
w2_blockscale_swizzled
=
swizzle_blockscale
(
layer
.
w2_weight_scale
)
layer
.
w2_weight_scale
=
Parameter
(
w2_blockscale_swizzled
,
requires_grad
=
False
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment