`CacheConfig.block_size` should always be `int` when used (#17052)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

`CacheConfig.block_size` should always be `int` when used (#17052)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
f3a21e9c · Harry Mellor · GitHub · 8e630d68 · f3a21e9c
Unverified Commit f3a21e9c authored Apr 23, 2025 by Harry Mellor Committed by GitHub Apr 23, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 2 deletions

vllm/config.py vllm/config.py +5 -2

No files found.
--- a/vllm/config.py
+++ b/vllm/config.py
@@ -1261,11 +1261,14 @@ PrefixCachingHashAlgo = Literal["builtin", "sha256"]
 class CacheConfig:
    """Configuration for the KV cache."""
-    block_size: Optional[BlockSize] = None
+    block_size: BlockSize = None  # type: ignore
    """Size of a contiguous cache block in number of tokens. This is ignored on
    neuron devices and set to `--max-model-len`. On CUDA devices, only block
    sizes up to 32 are supported. On HPU devices, block size defaults to 128.
-    """
+    This config has no static default. If left unspecified by the user, it will
+    be set in `Platform.check_and_update_configs()` based on the current
+    platform."""
    gpu_memory_utilization: float = 0.9
    """The fraction of GPU memory to be used for the model executor, which can
    range from 0 to 1. For example, a value of 0.5 would imply 50% GPU memory