Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
1e123529
Unverified
Commit
1e123529
authored
May 31, 2025
by
Yong Hoon Shin
Committed by
GitHub
May 31, 2025
Browse files
[Misc] Fix estimated max model len msg (#18966)
Signed-off-by:
Yong Hoon Shin
<
yhshin@meta.com
>
parent
dff80b0e
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
4 deletions
+5
-4
vllm/v1/core/kv_cache_utils.py
vllm/v1/core/kv_cache_utils.py
+5
-4
No files found.
vllm/v1/core/kv_cache_utils.py
View file @
1e123529
...
@@ -544,16 +544,17 @@ def check_enough_kv_cache_memory(vllm_config: VllmConfig,
...
@@ -544,16 +544,17 @@ def check_enough_kv_cache_memory(vllm_config: VllmConfig,
available_memory
)
available_memory
)
estimated_msg
=
""
estimated_msg
=
""
if
estimated_max_len
>
0
:
if
estimated_max_len
>
0
:
estimated_msg
=
" Based on the available memory,"
estimated_msg
=
(
f
" the estimated maximum model length is
{
estimated_max_len
}
."
"Based on the available memory, "
f
"the estimated maximum model length is
{
estimated_max_len
}
."
)
raise
ValueError
(
raise
ValueError
(
f
"To serve at least one request with the models's max seq len "
f
"To serve at least one request with the models's max seq len "
f
"(
{
max_model_len
}
), (
{
needed_memory
/
GiB_bytes
:.
2
f
}
GiB KV "
f
"(
{
max_model_len
}
), (
{
needed_memory
/
GiB_bytes
:.
2
f
}
GiB KV "
f
"cache is needed, which is larger than the available KV cache "
f
"cache is needed, which is larger than the available KV cache "
f
"memory (
{
available_memory
/
GiB_bytes
:.
2
f
}
GiB)."
f
"memory (
{
available_memory
/
GiB_bytes
:.
2
f
}
GiB).
"
f
"
{
estimated_msg
}
"
f
"
{
estimated_msg
}
"
f
"
Try increasing `gpu_memory_utilization` or decreasing "
f
"Try increasing `gpu_memory_utilization` or decreasing "
f
"`max_model_len` when initializing the engine."
)
f
"`max_model_len` when initializing the engine."
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment