Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
2164aab4
Commit
2164aab4
authored
Nov 26, 2025
by
wanglong3
Browse files
更新 vllm/_custom_ops.py, vllm/model_executor/layers/quantization/utils/w8a8_utils.py
parent
ade7db0c
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
2 deletions
+4
-2
vllm/_custom_ops.py
vllm/_custom_ops.py
+3
-1
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
+1
-1
No files found.
vllm/_custom_ops.py
View file @
2164aab4
...
...
@@ -1150,6 +1150,8 @@ def blaslt_scaled_mm(a: torch.Tensor,
n
=
b
.
shape
[
0
]
k
=
a
.
shape
[
1
]
_
,
out
=
quant_ops
.
hipblaslt_w8a8_gemm
(
a
,
b
,
scale_a
,
scale_b
,
m
,
n
,
k
,
'NT'
,
out_dtype
)
if
bias
is
not
None
:
out
+=
bias
return
out
def
triton_scaled_mm
(
a
:
torch
.
Tensor
,
...
...
@@ -2486,4 +2488,4 @@ direct_register_custom_op(
op_func
=
awq_gemm
,
mutates_args
=
[],
fake_impl
=
awq_gemm_fake
,
)
\ No newline at end of file
)
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
View file @
2164aab4
...
...
@@ -504,7 +504,7 @@ def apply_int8_linear(
scale_a
=
x_scale
,
scale_b
=
weight_scale
,
out_dtype
=
input
.
dtype
,
bias
=
None
)
bias
=
bias
)
else
:
return
ops
.
rocblas_scaled_mm
(
x_q
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment