Unverified Commit e390742c authored by XueLiang Yang's avatar XueLiang Yang Committed by GitHub
Browse files

Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536)


Signed-off-by: default avatarxueliangyang-oeuler <yxl546827391@gmail.com>
Co-authored-by: default avatarxueliangyang-oeuler <yxl546827391@gmail.com>
parent 7a6ebcbf
...@@ -240,7 +240,7 @@ class CpuGpuOffloadingHandlers: ...@@ -240,7 +240,7 @@ class CpuGpuOffloadingHandlers:
gpu_shape = gpu_tensor.shape gpu_shape = gpu_tensor.shape
attn_backend = attn_backends[layer_name] attn_backend = attn_backends[layer_name]
test_shape = attn_backend.get_kv_cache_shape( test_shape = attn_backend.get_kv_cache_shape(
num_blocks=1234, block_size=16, num_kv_heads=8, head_size=256 num_blocks=1234, block_size=16, num_kv_heads=1, head_size=256
) )
has_layers_dim = False has_layers_dim = False
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment