Unverified Commit 2edc87b1 authored by Thien Tran's avatar Thien Tran Committed by GitHub
Browse files

[Bugfix] Fix cache block size calculation for CPU MLA (#15848)


Signed-off-by: default avatarThien Tran <gau.nernst@yahoo.com.sg>
parent 4203926f
......@@ -106,7 +106,7 @@ class CPUCacheEngine:
num_layers = model_config.get_num_layers(parallel_config)
key_cache_block = block_size * num_heads * head_size
value_cache_block = key_cache_block
value_cache_block = key_cache_block if not model_config.use_mla else 0
total = num_layers * (key_cache_block + value_cache_block)
if cache_dtype == "auto":
dtype = model_config.dtype
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment