Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
e626d286
Unverified
Commit
e626d286
authored
Jul 27, 2025
by
TJian
Committed by
GitHub
Jul 28, 2025
Browse files
[FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel (#21242)
parent
c7ffe93d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
2 deletions
+13
-2
vllm/model_executor/layers/quantization/utils/fp8_utils.py
vllm/model_executor/layers/quantization/utils/fp8_utils.py
+13
-2
No files found.
vllm/model_executor/layers/quantization/utils/fp8_utils.py
View file @
e626d286
...
...
@@ -82,6 +82,13 @@ if current_platform.is_rocm():
fake_impl
=
rocm_aiter_gemm_w8a8_blockscale_fake
,
dispatch_key
=
current_platform
.
dispatch_key
,
)
if
(
envs
.
VLLM_ROCM_USE_AITER
and
envs
.
VLLM_ROCM_USE_AITER_LINEAR
and
current_platform
.
is_fp8_fnuz
()):
import
aiter
as
rocm_aiter
from
aiter
import
get_hip_quant
aiter_per1x128_quant
=
get_hip_quant
(
rocm_aiter
.
QuantType
.
per_1x128
)
def
dispatch_w8a8_blockscale_func
(
...
...
@@ -178,8 +185,12 @@ def apply_w8a8_block_fp8_linear(
block_size
,
input
.
dtype
)
else
:
q_input
,
x_scale
=
per_token_group_quant_fp8
(
input_2d
,
block_size
[
1
],
column_major_scales
=
use_cutlass
)
if
use_aiter_and_is_supported
:
q_input
,
x_scale
=
aiter_per1x128_quant
(
input_2d
.
contiguous
(),
quant_dtype
=
rocm_aiter
.
dtypes
.
fp8
)
else
:
q_input
,
x_scale
=
per_token_group_quant_fp8
(
input_2d
,
block_size
[
1
],
column_major_scales
=
use_cutlass
)
output
=
w8a8_blockscale_func
(
q_input
,
weight
,
x_scale
,
weight_scale
,
block_size
,
input
.
dtype
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment