Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
09c2856a
Commit
09c2856a
authored
Nov 26, 2025
by
zhuwenwen
Browse files
Fix blaslt miss bias
parent
9be76efd
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
1 deletion
+3
-1
vllm/_custom_ops.py
vllm/_custom_ops.py
+2
-0
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
+1
-1
No files found.
vllm/_custom_ops.py
View file @
09c2856a
...
@@ -1091,6 +1091,8 @@ def blaslt_scaled_mm(a: torch.Tensor,
...
@@ -1091,6 +1091,8 @@ def blaslt_scaled_mm(a: torch.Tensor,
n
=
b
.
shape
[
0
]
n
=
b
.
shape
[
0
]
k
=
a
.
shape
[
1
]
k
=
a
.
shape
[
1
]
_
,
out
=
quant_ops
.
hipblaslt_w8a8_gemm
(
a
,
b
,
scale_a
,
scale_b
,
m
,
n
,
k
,
'NT'
,
out_dtype
)
_
,
out
=
quant_ops
.
hipblaslt_w8a8_gemm
(
a
,
b
,
scale_a
,
scale_b
,
m
,
n
,
k
,
'NT'
,
out_dtype
)
if
bias
is
not
None
:
out
+=
bias
return
out
return
out
def
triton_scaled_mm
(
a
:
torch
.
Tensor
,
def
triton_scaled_mm
(
a
:
torch
.
Tensor
,
...
...
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
View file @
09c2856a
...
@@ -555,7 +555,7 @@ def apply_int8_linear(
...
@@ -555,7 +555,7 @@ def apply_int8_linear(
scale_a
=
x_scale
,
scale_a
=
x_scale
,
scale_b
=
weight_scale
,
scale_b
=
weight_scale
,
out_dtype
=
input
.
dtype
,
out_dtype
=
input
.
dtype
,
bias
=
None
)
bias
=
bias
)
else
:
else
:
return
ops
.
rocblas_scaled_mm
(
return
ops
.
rocblas_scaled_mm
(
x_q
,
x_q
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment