Unverified Commit e2b31243 authored by Seiji Eicher's avatar Seiji Eicher Committed by GitHub
Browse files

[Docs] Update `CacheConfig` block_size docstring to remove inaccurate limit...


[Docs] Update `CacheConfig` block_size docstring to remove inaccurate limit when using CUDA (#35632)
Signed-off-by: default avatarSeiji Eicher <seiji@anyscale.com>
parent c3598d02
...@@ -40,8 +40,7 @@ class CacheConfig: ...@@ -40,8 +40,7 @@ class CacheConfig:
"""Configuration for the KV cache.""" """Configuration for the KV cache."""
block_size: SkipValidation[BlockSize] = None # type: ignore[assignment] block_size: SkipValidation[BlockSize] = None # type: ignore[assignment]
"""Size of a contiguous cache block in number of tokens. On CUDA devices, """Size of a contiguous cache block in number of tokens.
only block sizes up to 32 are supported.
This config has no static default. If left unspecified by the user, it will This config has no static default. If left unspecified by the user, it will
be set in `Platform.check_and_update_config()` based on the current be set in `Platform.check_and_update_config()` based on the current
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment