[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors. (#39977)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors. (#39977)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
4f4713f9 · Chaojun Zhang · GitHub · 89361181 · 4f4713f9
Unverified Commit 4f4713f9 authored Apr 20, 2026 by Chaojun Zhang Committed by GitHub Apr 20, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 2 deletions

vllm/v1/worker/gpu_worker.py vllm/v1/worker/gpu_worker.py +6 -2

No files found.
--- a/vllm/v1/worker/gpu_worker.py
+++ b/vllm/v1/worker/gpu_worker.py
@@ -374,10 +374,14 @@ class Worker(WorkerBase):
            )
            # Profile CUDA graph memory if graphs will be captured.
-            # Skip on ROCm/HIP as graph pool handles and mem_get_info behave
+            # Skip on ROCm/HIP/XPU as graph pool handles and mem_get_info behave
            # differently and can produce incorrect/negative estimates.
            cudagraph_memory_estimate = 0
-            if not self.model_config.enforce_eager and not current_platform.is_rocm():
+            if (
+                not current_platform.is_rocm()
+                and self.vllm_config.compilation_config.cudagraph_mode
+                != CUDAGraphMode.NONE
+            ):
                cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
        # Use the pre-cudagraph torch peak to avoid double-counting.