Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
fd03538b
Unverified
Commit
fd03538b
authored
Feb 05, 2026
by
Fadi Arafeh
Committed by
GitHub
Feb 05, 2026
Browse files
[CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727)
Signed-off-by:
Fadi Arafeh
<
fadi.arafeh@arm.com
>
parent
1f70313e
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
1 deletion
+3
-1
vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py
...del_executor/layers/quantization/kernels/scaled_mm/cpu.py
+3
-1
No files found.
vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py
View file @
fd03538b
...
...
@@ -182,6 +182,8 @@ class CPUInt8ScaledMMLinearKernel(Int8ScaledMMLinearKernel):
x
:
torch
.
Tensor
,
bias
:
torch
.
Tensor
|
None
=
None
,
)
->
torch
.
Tensor
:
x_shape
=
x
.
shape
x
=
x
.
reshape
(
-
1
,
x_shape
[
-
1
])
if
len
(
x_shape
)
>
2
else
x
w_q
,
w_s
,
i_s
,
i_zp
,
azp_adj
=
self
.
_get_layer_params
(
layer
)
# ops.scaled_int8_quant supports both dynamic and static quant:
...
...
@@ -195,7 +197,7 @@ class CPUInt8ScaledMMLinearKernel(Int8ScaledMMLinearKernel):
n
=
self
.
dnnl_handler
.
n
out
=
torch
.
empty
((
m
,
n
),
dtype
=
x
.
dtype
)
ops
.
onednn_scaled_mm
(
self
.
dnnl_handler
,
x_q
,
out
,
x_s
,
x_zp
,
azp_adj
,
bias
)
out
=
out
.
reshape
(
x_shape
[:
-
1
]
+
(
n
,))
if
len
(
x_shape
)
>
2
else
out
return
out
def
_apply_weights_sgl
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment