Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
9949aa2e
Unverified
Commit
9949aa2e
authored
Sep 22, 2025
by
Wentao Ye
Committed by
GitHub
Sep 22, 2025
Browse files
[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611)
Signed-off-by:
yewentao256
<
zhyanwentao@126.com
>
parent
0b7bed9c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
2 deletions
+2
-2
vllm/utils/deep_gemm.py
vllm/utils/deep_gemm.py
+2
-2
No files found.
vllm/utils/deep_gemm.py
View file @
9949aa2e
...
...
@@ -135,7 +135,7 @@ DEFAULT_BLOCK_SIZE = [128, 128]
# Taken from https://github.com/deepseek-ai/DeepGEMM/blob/dd6ed14acbc7445dcef224248a77ab4d22b5f240/deep_gemm/utils/math.py#L38
# TODO(wentao): optimize this function, using triton or cuda kernel
@
torch
.
compile
(
dynamic
=
True
,
backend
=
current_platform
.
simple_compile_backend
)
def
per_block_cast_to_fp8
(
x
:
torch
.
Tensor
,
block_size
:
list
[
int
]
=
DEFAULT_BLOCK_SIZE
,
...
...
@@ -187,4 +187,4 @@ __all__ = [
"is_deep_gemm_e8m0_used"
,
"is_deep_gemm_supported"
,
"should_use_deepgemm_for_fp8_linear"
,
]
]
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment