Unverified Commit 5d91d2b2 authored by maang-h's avatar maang-h Committed by GitHub
Browse files

[Doc] Add allocate_slots parameter docs (#29777)


Signed-off-by: default avatarmaang <maang_h@163.com>
Signed-off-by: default avatarmaang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: default avatarChen Zhang <zhangch99@outlook.com>
parent c014de1e
...@@ -230,6 +230,9 @@ class KVCacheManager: ...@@ -230,6 +230,9 @@ class KVCacheManager:
delay_cache_blocks: Whether to skip caching the blocks. This is delay_cache_blocks: Whether to skip caching the blocks. This is
used by P/D when allocating blocks used in a KV transfer used by P/D when allocating blocks used in a KV transfer
which will complete in a future step. which will complete in a future step.
num_encoder_tokens: The number of encoder tokens to allocate for
cross-attention in encoder-decoder models(e.g., Whisper).
For decoder-only models, this should be 0.
Blocks layout: Blocks layout:
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment