Unverified Commit 7901109e authored by linhaifeng's avatar linhaifeng Committed by GitHub
Browse files

[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (#32603)


Signed-off-by: default avatarlinhaifeng <1371675203@qq.com>
parent 13f6630a
......@@ -609,7 +609,7 @@ def _num_tokens_to_min_blocks(num_tokens: int, block_size: int) -> int:
Compute the minimum number of blocks required to hold num_tokens tokens,
given block_size
"""
return (num_tokens + block_size) // block_size
return (num_tokens + block_size - 1) // block_size
def make_empty_slot_mapping_tensor(device: torch.device | str):
......@@ -694,7 +694,7 @@ def make_block_tables_slot_mapping(
For a sequence with num_tokens tokens the minimum number
of required KV cache blocks is
num_blocks = (num_tokens + block_size) // block_size
num_blocks = (num_tokens + block_size - 1) // block_size
Then the minimum KV cache size in blocks is
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment