Unverified Commit 4f4713f9 authored by Chaojun Zhang's avatar Chaojun Zhang Committed by GitHub
Browse files

[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors. (#39977)


Signed-off-by: default avatarchaojun-zhang <chaojun.zhang@intel.com>
Co-authored-by: default avatarKunshang Ji <kunshang.ji@intel.com>
parent 89361181
...@@ -374,10 +374,14 @@ class Worker(WorkerBase): ...@@ -374,10 +374,14 @@ class Worker(WorkerBase):
) )
# Profile CUDA graph memory if graphs will be captured. # Profile CUDA graph memory if graphs will be captured.
# Skip on ROCm/HIP as graph pool handles and mem_get_info behave # Skip on ROCm/HIP/XPU as graph pool handles and mem_get_info behave
# differently and can produce incorrect/negative estimates. # differently and can produce incorrect/negative estimates.
cudagraph_memory_estimate = 0 cudagraph_memory_estimate = 0
if not self.model_config.enforce_eager and not current_platform.is_rocm(): if (
not current_platform.is_rocm()
and self.vllm_config.compilation_config.cudagraph_mode
!= CUDAGraphMode.NONE
):
cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory() cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
# Use the pre-cudagraph torch peak to avoid double-counting. # Use the pre-cudagraph torch peak to avoid double-counting.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment