Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
5be8f1ed
Unverified
Commit
5be8f1ed
authored
Mar 05, 2025
by
yigex
Committed by
GitHub
Mar 05, 2025
Browse files
ROCM: AITER BLOCK GEMM (#4075)
parent
e5760bc4
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
14 additions
and
1 deletion
+14
-1
python/sglang/srt/layers/quantization/fp8_utils.py
python/sglang/srt/layers/quantization/fp8_utils.py
+14
-1
No files found.
python/sglang/srt/layers/quantization/fp8_utils.py
View file @
5be8f1ed
...
...
@@ -8,9 +8,12 @@ from sglang.srt.layers.quantization.fp8_kernel import (
per_token_group_quant_fp8
,
w8a8_block_fp8_matmul
,
)
from
sglang.srt.utils
import
is_hip
from
sglang.srt.utils
import
get_bool_env_var
,
is_hip
is_hip_
=
is_hip
()
if
is_hip_
and
get_bool_env_var
(
"CK_MOE"
):
from
aiter
import
gemm_a8w8_blockscale
_is_cuda
=
torch
.
cuda
.
is_available
()
and
torch
.
version
.
cuda
if
_is_cuda
:
from
sgl_kernel
import
fp8_blockwise_scaled_mm
...
...
@@ -78,6 +81,16 @@ def apply_w8a8_block_fp8_linear(
output
=
fp8_blockwise_scaled_mm
(
q_input
,
weight
.
T
,
x_scale
,
weight_scale
.
T
,
out_dtype
=
input
.
dtype
)
elif
is_hip_
and
get_bool_env_var
(
"CK_MOE"
):
q_input
,
x_scale
=
per_token_group_quant_fp8
(
input_2d
,
block_size
[
1
],
column_major_scales
=
False
)
output
=
torch
.
zeros
(
[
q_input
.
shape
[
0
],
weight
.
shape
[
0
]],
dtype
=
input
.
dtype
,
device
=
q_input
.
device
,
)
gemm_a8w8_blockscale
(
q_input
,
weight
,
x_scale
,
weight_scale
,
output
)
else
:
q_input
,
x_scale
=
per_token_group_quant_fp8
(
input_2d
,
block_size
[
1
],
column_major_scales
=
False
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment