[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder (#25701)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder (#25701)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
13dd93c6 · Russell Bryant · GitHub · 53a30845 · 13dd93c6
Unverified Commit 13dd93c6 authored Sep 25, 2025 by Russell Bryant Committed by GitHub Sep 25, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

vllm/config/__init__.py vllm/config/__init__.py +4 -2

No files found.
--- a/vllm/config/__init__.py
+++ b/vllm/config/__init__.py
@@ -364,9 +364,11 @@ class VllmConfig:
                    self.compilation_config.cudagraph_mode = \
                        CUDAGraphMode.FULL_AND_PIECEWISE

-                    # pooling model does not support full cudagraphs
+                    # pooling models and encoder-decoder models
+                    # do not support full cudagraphs
                    if self.model_config is not None and \
-                        self.model_config.pooler_config is not None:
+                        (self.model_config.pooler_config is not None
+                         or self.model_config.is_encoder_decoder):
                        self.compilation_config.cudagraph_mode = \
                            CUDAGraphMode.PIECEWISE
                else: