Commit 4a84480f authored by zhuwenwen's avatar zhuwenwen
Browse files

update default block_size

parent b61f7a69
...@@ -1464,7 +1464,7 @@ PrefixCachingHashAlgo = Literal["builtin", "sha256"] ...@@ -1464,7 +1464,7 @@ PrefixCachingHashAlgo = Literal["builtin", "sha256"]
class CacheConfig: class CacheConfig:
"""Configuration for the KV cache.""" """Configuration for the KV cache."""
block_size: SkipValidation[BlockSize] = None # type: ignore block_size: SkipValidation[BlockSize] = 64 # type: ignore
"""Size of a contiguous cache block in number of tokens. This is ignored on """Size of a contiguous cache block in number of tokens. This is ignored on
neuron devices and set to `--max-model-len`. On CUDA devices, only block neuron devices and set to `--max-model-len`. On CUDA devices, only block
sizes up to 32 are supported. On HPU devices, block size defaults to 128. sizes up to 32 are supported. On HPU devices, block size defaults to 128.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment