Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
9ec1e306
Unverified
Commit
9ec1e306
authored
Jul 01, 2025
by
Lifans
Committed by
GitHub
Jul 01, 2025
Browse files
[Misc][Doc] Add missing comment for LLM (#20285)
Signed-off-by:
Lifan Shen
<
lifans@meta.com
>
parent
9dae7d46
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
20 additions
and
12 deletions
+20
-12
vllm/entrypoints/llm.py
vllm/entrypoints/llm.py
+20
-12
No files found.
vllm/entrypoints/llm.py
View file @
9ec1e306
...
@@ -132,6 +132,14 @@ class LLM:
...
@@ -132,6 +132,14 @@ class LLM:
hf_overrides: If a dictionary, contains arguments to be forwarded to the
hf_overrides: If a dictionary, contains arguments to be forwarded to the
HuggingFace config. If a callable, it is called to update the
HuggingFace config. If a callable, it is called to update the
HuggingFace config.
HuggingFace config.
mm_processor_kwargs: Arguments to be forwarded to the model's processor
for multi-modal data, e.g., image processor. Overrides for the
multi-modal processor obtained from `AutoProcessor.from_pretrained`.
The available overrides depend on the model that is being run.
For example, for Phi-3-Vision: `{"num_crops": 4}`.
override_pooler_config: Initialize non-default pooling config or
override default pooling config for the pooling model.
e.g. `PoolerConfig(pooling_type="mean", normalize=False)`.
compilation_config: Either an integer or a dictionary. If it is an
compilation_config: Either an integer or a dictionary. If it is an
integer, it is used as the level of compilation optimization. If it
integer, it is used as the level of compilation optimization. If it
is a dictionary, it can specify the full compilation configuration.
is a dictionary, it can specify the full compilation configuration.
...
@@ -1347,16 +1355,16 @@ class LLM:
...
@@ -1347,16 +1355,16 @@ class LLM:
during the sleep period, before `wake_up` is called.
during the sleep period, before `wake_up` is called.
Args:
Args:
level: The sleep level. Level 1 sleep will offload the model
level: The sleep level. Level 1 sleep will offload the model
weights and discard the kv cache. The content of kv cache
weights and discard the kv cache. The content of kv cache
is forgotten. Level 1 sleep is good for sleeping and waking
is forgotten. Level 1 sleep is good for sleeping and waking
up the engine to run the same model again. The model weights
up the engine to run the same model again. The model weights
are backed up in CPU memory. Please make sure there's enough
are backed up in CPU memory. Please make sure there's enough
CPU memory to store the model weights. Level 2 sleep will
CPU memory to store the model weights. Level 2 sleep will
discard both the model weights and the kv cache. The content
discard both the model weights and the kv cache. The content
of both the model weights and kv cache is forgotten. Level 2
of both the model weights and kv cache is forgotten. Level 2
sleep is good for sleeping and waking up the engine to run a
sleep is good for sleeping and waking up the engine to run a
different model or update the model, where previous model
different model or update the model, where previous model
weights are not needed. It reduces CPU memory pressure.
weights are not needed. It reduces CPU memory pressure.
"""
"""
self
.
reset_prefix_cache
()
self
.
reset_prefix_cache
()
...
@@ -1366,12 +1374,12 @@ class LLM:
...
@@ -1366,12 +1374,12 @@ class LLM:
"""
"""
Wake up the engine from sleep mode. See the [sleep][] method
Wake up the engine from sleep mode. See the [sleep][] method
for more details.
for more details.
Args:
Args:
tags: An optional list of tags to reallocate the engine memory
tags: An optional list of tags to reallocate the engine memory
for specific memory allocations. Values must be in
for specific memory allocations. Values must be in
`("weights", "kv_cache")`. If None, all memory is reallocated.
`("weights", "kv_cache")`. If None, all memory is reallocated.
wake_up should be called with all tags (or None) before the
wake_up should be called with all tags (or None) before the
engine is used again.
engine is used again.
"""
"""
self
.
llm_engine
.
wake_up
(
tags
)
self
.
llm_engine
.
wake_up
(
tags
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment