Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
5fbfa8d9
Unverified
Commit
5fbfa8d9
authored
Dec 19, 2025
by
Jinzhen Lin
Committed by
GitHub
Dec 19, 2025
Browse files
[Quantization] fix marlin w8a8 check (#30961)
Signed-off-by:
Jinzhen Lin
<
jinzhen.ljz@antgroup.com
>
parent
23a1946e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
6 deletions
+3
-6
vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py
...el_executor/layers/quantization/utils/marlin_utils_fp8.py
+3
-6
No files found.
vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py
View file @
5fbfa8d9
...
@@ -11,7 +11,6 @@ from vllm.model_executor.layers.quantization.utils.marlin_utils import (
...
@@ -11,7 +11,6 @@ from vllm.model_executor.layers.quantization.utils.marlin_utils import (
marlin_make_workspace_new
,
marlin_make_workspace_new
,
marlin_permute_bias
,
marlin_permute_bias
,
marlin_permute_scales
,
marlin_permute_scales
,
marlin_quant_input
,
should_use_atomic_add_reduce
,
should_use_atomic_add_reduce
,
)
)
from
vllm.model_executor.utils
import
replace_parameter
from
vllm.model_executor.utils
import
replace_parameter
...
@@ -63,13 +62,11 @@ def apply_fp8_marlin_linear(
...
@@ -63,13 +62,11 @@ def apply_fp8_marlin_linear(
inputs
=
reshaped_x
inputs
=
reshaped_x
a_scales
=
None
a_scales
=
None
if
input_dtype
is
not
None
and
input_dtype
.
itemsize
==
1
:
if
input_dtype
is
not
None
and
input_dtype
.
itemsize
==
1
:
if
input_dtype
!=
torch
.
float8_e4m3fn
:
# inputs, a_scales = marlin_quant_input(inputs, torch.float8_e4m3fn)
raise
RuntimeError
(
"FP8 weight + INT8 activation is not supported."
)
raise
RuntimeError
(
"Marlin W8A8 is not supported."
)
inputs
,
a_scales
=
marlin_quant_input
(
inputs
,
torch
.
float8_e4m3fn
)
output
=
ops
.
gptq_marlin_gemm
(
output
=
ops
.
gptq_marlin_gemm
(
a
=
reshaped_x
,
a
=
inputs
,
c
=
None
,
c
=
None
,
b_q_weight
=
weight
,
b_q_weight
=
weight
,
b_bias
=
bias
,
b_bias
=
bias
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment