Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
c16fb5da
Unverified
Commit
c16fb5da
authored
Apr 18, 2025
by
Cyrus Leung
Committed by
GitHub
Apr 17, 2025
Browse files
[Doc] Improve help examples for `--compilation-config` (#16729)
Signed-off-by:
DarkLight1337
<
tlleungac@connect.ust.hk
>
parent
e37073ef
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
17 additions
and
8 deletions
+17
-8
docs/source/design/v1/torch_compile.md
docs/source/design/v1/torch_compile.md
+1
-1
tests/engine/test_arg_utils.py
tests/engine/test_arg_utils.py
+12
-4
vllm/engine/arg_utils.py
vllm/engine/arg_utils.py
+4
-3
No files found.
docs/source/design/v1/torch_compile.md
View file @
c16fb5da
...
...
@@ -134,6 +134,6 @@ The cudagraphs are captured and managed by the compiler backend, and replayed wh
By default, vLLM will try to determine a set of sizes to capture cudagraph. You can also override it using the config
`cudagraph_capture_sizes`
:
`VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.2-1B --compilation
_
config "{'cudagraph_capture_sizes': [1, 2, 4, 8]}"`
`VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.2-1B --compilation
-
config "{'cudagraph_capture_sizes': [1, 2, 4, 8]}"`
Then it will only capture cudagraph for the specified sizes. It can be useful to have fine-grained control over the cudagraph capture.
tests/engine/test_arg_utils.py
View file @
c16fb5da
...
...
@@ -53,12 +53,20 @@ def test_compilation_config():
assert
args
.
compilation_config
.
level
==
3
# set to string form of a dict
args
=
parser
.
parse_args
([
"--compilation-config"
,
"{'level': 3}"
])
assert
args
.
compilation_config
.
level
==
3
args
=
parser
.
parse_args
([
"--compilation-config"
,
"{'level': 3, 'cudagraph_capture_sizes': [1, 2, 4, 8]}"
,
])
assert
(
args
.
compilation_config
.
level
==
3
and
args
.
compilation_config
.
cudagraph_capture_sizes
==
[
1
,
2
,
4
,
8
])
# set to string form of a dict
args
=
parser
.
parse_args
([
"--compilation-config={'level': 3}"
])
assert
args
.
compilation_config
.
level
==
3
args
=
parser
.
parse_args
([
"--compilation-config="
"{'level': 3, 'cudagraph_capture_sizes': [1, 2, 4, 8]}"
,
])
assert
(
args
.
compilation_config
.
level
==
3
and
args
.
compilation_config
.
cudagraph_capture_sizes
==
[
1
,
2
,
4
,
8
])
def
test_prefix_cache_default
():
...
...
vllm/engine/arg_utils.py
View file @
c16fb5da
...
...
@@ -939,10 +939,11 @@ class EngineArgs:
'testing only. level 3 is the recommended level '
'for production.
\n
'
'To specify the full compilation config, '
'use a JSON string.
\n
'
'use a JSON string, e.g. ``{"level": 3, '
'"cudagraph_capture_sizes": [1, 2, 4, 8]}``
\n
'
'Following the convention of traditional '
'compilers, using
-O
without space is also '
'supported. -O3 is equivalent to -O 3.'
)
'compilers, using
``-O``
without space is also '
'supported.
``
-O3
``
is equivalent to
``
-O 3
``
.'
)
parser
.
add_argument
(
'--kv-transfer-config'
,
type
=
KVTransferConfig
.
from_cli
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment