Enable prefix caching with full cuda graphs (#19617)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Enable prefix caching with full cuda graphs (#19617)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
055915e6 · Woosuk Kwon · GitHub · 3d330c4c · 055915e6
Unverified Commit 055915e6 authored Jun 15, 2025 by Woosuk Kwon Committed by GitHub Jun 15, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 0 additions and 1 deletion

vllm/config.py vllm/config.py +0 -1

No files found.
--- a/vllm/config.py
+++ b/vllm/config.py
@@ -4495,7 +4495,6 @@ class VllmConfig:
                "full_cuda_graph is not supported with "
                "cascade attention. Disabling cascade attention.")
            self.model_config.disable_cascade_attn = True
-            self.cache_config.enable_prefix_caching = False

        if (self.kv_events_config is not None
                and self.kv_events_config.enable_kv_cache_events