Unverified Commit fea80060 authored by Tyler Michael Smith's avatar Tyler Michael Smith Committed by GitHub
Browse files

[Logging] Improve log for when DeepEP HT disables CUDA Graphs (#25531)


Signed-off-by: default avatarTyler Michael Smith <tyler@neuralmagic.com>
parent e6750d0b
......@@ -186,11 +186,12 @@ class CudaPlatformBase(Platform):
# if torch compile cache key issue fixed
# See https://github.com/vllm-project/vllm/pull/25093
logger.info(
"Data Parallel: disabling cudagraphs since DP "
"with DeepEP high-throughput kernels are not CUDA Graph "
"compatible. The DeepEP low-latency kernels are CUDA Graph "
"compatible. Set the all_to_all backend to deepep_low_latency "
"to use those kernels instead.")
"WideEP: Disabling CUDA Graphs since DeepEP high-throughput "
"kernels are optimized for prefill and are incompatible with "
"CUDA Graphs. "
"In order to use CUDA Graphs for decode-optimized workloads, "
"set VLLM_ALL2ALL_BACKEND to another option, such as "
"deepep_low_latency, pplx, or allgather_reducescatter.")
compilation_config.cudagraph_mode = CUDAGraphMode.NONE
@classmethod
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment