Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
7bc5fb0d
Unverified
Commit
7bc5fb0d
authored
Sep 12, 2025
by
Zaili Wang
Committed by
GitHub
Sep 11, 2025
Browse files
[CPU][doc] add torch.compile param in example commands (#10349)
parent
144ee5f3
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
3 deletions
+11
-3
docs/platforms/cpu_server.md
docs/platforms/cpu_server.md
+11
-3
No files found.
docs/platforms/cpu_server.md
View file @
7bc5fb0d
...
@@ -139,9 +139,10 @@ Notes:
...
@@ -139,9 +139,10 @@ Notes:
You may need to set proper `--max-total-tokens` to avoid the out-of-memory error.
You may need to set proper `--max-total-tokens` to avoid the out-of-memory error.
3.
For optimizing decoding with torch.compile, please add the flag
`--enable-torch-compile`
.
3.
For optimizing decoding with torch.compile, please add the flag
`--enable-torch-compile`
.
To specify the maximum batch size when using torch compile, set the flag
`--torch-compile-max-bs`
.
To specify the maximum batch size when using
`torch.compile`
, set the flag
`--torch-compile-max-bs`
.
For example,
`--enable-torch-compile --torch-compile-max-bs 4`
means using torch compile and setting the
For example,
`--enable-torch-compile --torch-compile-max-bs 4`
means using
`torch.compile`
maximum batch size to 4.
and setting the maximum batch size to 4. Currently the maximum applicable batch size
for optimizing with
`torch.compile`
is 16.
4.
A warmup step is automatically triggered when the service is started.
4.
A warmup step is automatically triggered when the service is started.
The server is ready when you see the log
`The server is fired up and ready to roll!`
.
The server is ready when you see the log
`The server is fired up and ready to roll!`
.
...
@@ -184,6 +185,8 @@ python -m sglang.launch_server \
...
@@ -184,6 +185,8 @@ python -m sglang.launch_server \
--quantization
w8a8_int8
\
--quantization
w8a8_int8
\
--host
0.0.0.0
\
--host
0.0.0.0
\
--mem-fraction-static
0.8
\
--mem-fraction-static
0.8
\
--enable-torch-compile
\
--torch-compile-max-bs
4
\
--tp
6
--tp
6
```
```
...
@@ -197,8 +200,13 @@ python -m sglang.launch_server \
...
@@ -197,8 +200,13 @@ python -m sglang.launch_server \
--device
cpu
\
--device
cpu
\
--host
0.0.0.0
\
--host
0.0.0.0
\
--mem-fraction-static
0.8
\
--mem-fraction-static
0.8
\
--enable-torch-compile
\
--torch-compile-max-bs
4
\
--tp
6
--tp
6
```
```
Note: Please set
`--torch-compile-max-bs`
to the maximum desired batch size for your deployment,
which can be up to 16. The value
`4`
in the examples is illustrative.
Then you can test with
`bench_serving`
command or construct your own command or script
Then you can test with
`bench_serving`
command or construct your own command or script
following
[
the benchmarking example
](
#benchmarking-with-requests
)
.
following
[
the benchmarking example
](
#benchmarking-with-requests
)
.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment