Unverified Commit fea80060 authored by Tyler Michael Smith's avatar Tyler Michael Smith Committed by GitHub
Browse files

[Logging] Improve log for when DeepEP HT disables CUDA Graphs (#25531)


Signed-off-by: default avatarTyler Michael Smith <tyler@neuralmagic.com>
parent e6750d0b
...@@ -186,11 +186,12 @@ class CudaPlatformBase(Platform): ...@@ -186,11 +186,12 @@ class CudaPlatformBase(Platform):
# if torch compile cache key issue fixed # if torch compile cache key issue fixed
# See https://github.com/vllm-project/vllm/pull/25093 # See https://github.com/vllm-project/vllm/pull/25093
logger.info( logger.info(
"Data Parallel: disabling cudagraphs since DP " "WideEP: Disabling CUDA Graphs since DeepEP high-throughput "
"with DeepEP high-throughput kernels are not CUDA Graph " "kernels are optimized for prefill and are incompatible with "
"compatible. The DeepEP low-latency kernels are CUDA Graph " "CUDA Graphs. "
"compatible. Set the all_to_all backend to deepep_low_latency " "In order to use CUDA Graphs for decode-optimized workloads, "
"to use those kernels instead.") "set VLLM_ALL2ALL_BACKEND to another option, such as "
"deepep_low_latency, pplx, or allgather_reducescatter.")
compilation_config.cudagraph_mode = CUDAGraphMode.NONE compilation_config.cudagraph_mode = CUDAGraphMode.NONE
@classmethod @classmethod
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment