You need to sign in or sign up before continuing.
Unverified Commit 055915e6 authored by Woosuk Kwon's avatar Woosuk Kwon Committed by GitHub
Browse files

Enable prefix caching with full cuda graphs (#19617)


Signed-off-by: default avatarWoosuk Kwon <woosuk.kwon@berkeley.edu>
parent 3d330c4c
......@@ -4495,7 +4495,6 @@ class VllmConfig:
"full_cuda_graph is not supported with "
"cascade attention. Disabling cascade attention.")
self.model_config.disable_cascade_attn = True
self.cache_config.enable_prefix_caching = False
if (self.kv_events_config is not None
and self.kv_events_config.enable_kv_cache_events
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment