Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
f79d9dce
Unverified
Commit
f79d9dce
authored
Feb 06, 2026
by
Fadi Arafeh
Committed by
GitHub
Feb 06, 2026
Browse files
[CPU][BugFix] Fix loading of w8a8int models with bias (#33582)
Signed-off-by:
Fadi Arafeh
<
fadi.arafeh@arm.com
>
parent
ba5cbbf1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
3 deletions
+8
-3
vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py
...yers/quantization/kernels/mixed_precision/dynamic_4bit.py
+8
-3
No files found.
vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py
View file @
f79d9dce
...
...
@@ -86,9 +86,14 @@ class Dynamic4bitLinearKernel(MPLinearKernel):
)
# Float32 & Bfloat16 variants requires float32 scales
scales
=
scales
.
view
(
-
1
,
1
)
# Channel-wise scales
if
layer
.
bias
is
not
None
:
layer
.
bias
=
layer
.
bias
.
to
(
torch
.
float32
)
# Float32 & Bfloat16 variants requires float32 bias
# Float32 & Bfloat16 variants requires float32 bias
replace_parameter
(
layer
,
"bias"
,
torch
.
nn
.
Parameter
(
layer
.
bias
.
to
(
torch
.
float32
),
requires_grad
=
False
),
)
else
:
# KleidiAI kernel requires bfloat16 scales with groupwise scheme
scales
=
scales
.
to
(
torch
.
bfloat16
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment