[Doc] Fix description in the Automatic Prefix Caching design doc (#19333)

Signed-off-by: cr7258 <chengzw258@163.com>

[Doc] Fix description in the Automatic Prefix Caching design doc (#19333)
Signed-off-by: cr7258 <chengzw258@163.com>
0eca5eac · Se7en · GitHub · 12e58292 · 0eca5eac
Unverified Commit 0eca5eac authored Jun 09, 2025 by Se7en Committed by GitHub Jun 09, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

docs/design/v1/prefix_caching.md docs/design/v1/prefix_caching.md +1 -1

No files found.
--- a/docs/design/v1/prefix_caching.md
+++ b/docs/design/v1/prefix_caching.md
@@ -144,7 +144,7 @@ As a result, we will have the following components when the KV cache manager is

 **Running request:** Workflow for the scheduler to schedule a running request with KV cache block allocation:

-1. The scheduler calls `kv_cache_manager.append_slots()`. It does the following steps:  
+1. The scheduler calls `kv_cache_manager.allocate_slots()`. It does the following steps:  
   1. Compute the number of new required blocks, and return if there are no sufficient blocks to allocate.  
   2. Allocate new blocks by popping the heads of the free queue. If the head block is a cached block, this also “evicts” the block so that no other requests can reuse it anymore from now on.  
   3. Append token IDs to the slots in existing blocks as well as the new blocks. If a block is full, we add it to the Cache Block to cache it.