Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
2edc87b1
Unverified
Commit
2edc87b1
authored
Apr 02, 2025
by
Thien Tran
Committed by
GitHub
Apr 02, 2025
Browse files
[Bugfix] Fix cache block size calculation for CPU MLA (#15848)
Signed-off-by:
Thien Tran
<
gau.nernst@yahoo.com.sg
>
parent
4203926f
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
vllm/worker/cpu_worker.py
vllm/worker/cpu_worker.py
+1
-1
No files found.
vllm/worker/cpu_worker.py
View file @
2edc87b1
...
@@ -106,7 +106,7 @@ class CPUCacheEngine:
...
@@ -106,7 +106,7 @@ class CPUCacheEngine:
num_layers
=
model_config
.
get_num_layers
(
parallel_config
)
num_layers
=
model_config
.
get_num_layers
(
parallel_config
)
key_cache_block
=
block_size
*
num_heads
*
head_size
key_cache_block
=
block_size
*
num_heads
*
head_size
value_cache_block
=
key_cache_block
value_cache_block
=
key_cache_block
if
not
model_config
.
use_mla
else
0
total
=
num_layers
*
(
key_cache_block
+
value_cache_block
)
total
=
num_layers
*
(
key_cache_block
+
value_cache_block
)
if
cache_dtype
==
"auto"
:
if
cache_dtype
==
"auto"
:
dtype
=
model_config
.
dtype
dtype
=
model_config
.
dtype
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment