[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
9949aa2e · Wentao Ye · GitHub · 0b7bed9c · 9949aa2e
Unverified Commit 9949aa2e authored Sep 22, 2025 by Wentao Ye Committed by GitHub Sep 22, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

vllm/utils/deep_gemm.py vllm/utils/deep_gemm.py +2 -2

No files found.
--- a/vllm/utils/deep_gemm.py
+++ b/vllm/utils/deep_gemm.py
@@ -135,7 +135,7 @@ DEFAULT_BLOCK_SIZE = [128, 128]
 # Taken from https://github.com/deepseek-ai/DeepGEMM/blob/dd6ed14acbc7445dcef224248a77ab4d22b5f240/deep_gemm/utils/math.py#L38
-# TODO(wentao): optimize this function, using triton or cuda kernel
+@torch.compile(dynamic=True, backend=current_platform.simple_compile_backend)
 def per_block_cast_to_fp8(
        x: torch.Tensor,
        block_size: list[int] = DEFAULT_BLOCK_SIZE,
@@ -187,4 +187,4 @@ __all__ = [
    "is_deep_gemm_e8m0_used",
    "is_deep_gemm_supported",
    "should_use_deepgemm_for_fp8_linear",
 ]
\ No newline at end of file