Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
1f19d8f8
Unverified
Commit
1f19d8f8
authored
Dec 12, 2025
by
Xin Yang
Committed by
GitHub
Dec 12, 2025
Browse files
[Perf] Set split_k to 1 for triton_kernels (#30528)
Signed-off-by:
Xin Yang
<
xyangx@amazon.com
>
parent
cd7740ac
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
6 deletions
+12
-6
vllm/model_executor/layers/quantization/utils/mxfp4_utils.py
vllm/model_executor/layers/quantization/utils/mxfp4_utils.py
+12
-6
No files found.
vllm/model_executor/layers/quantization/utils/mxfp4_utils.py
View file @
1f19d8f8
...
@@ -57,12 +57,18 @@ def _swizzle_mxfp4(quant_tensor, scale, num_warps):
...
@@ -57,12 +57,18 @@ def _swizzle_mxfp4(quant_tensor, scale, num_warps):
mx_axis
=
1
,
num_warps
=
num_warps
mx_axis
=
1
,
num_warps
=
num_warps
)
)
)
)
if
current_platform
.
is_cuda
()
and
current_platform
.
is_device_capability
(
100
):
if
current_platform
.
is_cuda
():
constraints
=
{
if
current_platform
.
is_device_capability
(
90
):
"is_persistent"
:
True
,
constraints
=
{
"epilogue_subtile"
:
1
,
"split_k"
:
1
,
}
}
opt_flags
.
update_opt_flags_constraints
(
constraints
)
opt_flags
.
update_opt_flags_constraints
(
constraints
)
elif
current_platform
.
is_device_capability
(
100
):
constraints
=
{
"is_persistent"
:
True
,
"epilogue_subtile"
:
1
,
}
opt_flags
.
update_opt_flags_constraints
(
constraints
)
# transpose the tensor so that the quantization axis is on dim1
# transpose the tensor so that the quantization axis is on dim1
quant_tensor
=
quant_tensor
.
transpose
(
-
2
,
-
1
)
quant_tensor
=
quant_tensor
.
transpose
(
-
2
,
-
1
)
scale
=
scale
.
transpose
(
-
2
,
-
1
)
scale
=
scale
.
transpose
(
-
2
,
-
1
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment