Unverified Commit 4b68c4a5 authored by Jialin Ouyang's avatar Jialin Ouyang Committed by GitHub
Browse files

[Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty (#27799)


Signed-off-by: default avatarJialin Ouyang <Jialin.Ouyang@gmail.com>
parent a8141fa6
...@@ -306,6 +306,7 @@ class KVCacheManager: ...@@ -306,6 +306,7 @@ class KVCacheManager:
"Computed blocks should be empty when prefix caching is disabled" "Computed blocks should be empty when prefix caching is disabled"
) )
if new_computed_block_list is not self.empty_kv_cache_blocks.blocks:
# Append the new computed blocks to the request blocks until now to # Append the new computed blocks to the request blocks until now to
# avoid the case where the new blocks cannot be allocated. # avoid the case where the new blocks cannot be allocated.
self.coordinator.save_new_computed_blocks( self.coordinator.save_new_computed_blocks(
......
...@@ -151,7 +151,7 @@ class SingleTypeKVCacheManager(ABC): ...@@ -151,7 +151,7 @@ class SingleTypeKVCacheManager(ABC):
num_tokens: The total number of tokens that need to be cached num_tokens: The total number of tokens that need to be cached
(including tokens that are already cached). (including tokens that are already cached).
""" """
num_cached_blocks = self.num_cached_block[request.request_id] num_cached_blocks = self.num_cached_block.get(request.request_id, 0)
num_full_blocks = num_tokens // self.block_size num_full_blocks = num_tokens // self.block_size
if num_cached_blocks >= num_full_blocks: if num_cached_blocks >= num_full_blocks:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment