Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
159cc741
"docs/source/vscode:/vscode.git/clone" did not exist on "8900450d9ea8ba50e931ae6053faf06ac73b4aeb"
Unverified
Commit
159cc741
authored
May 31, 2024
by
Lianmin Zheng
Committed by
GitHub
May 31, 2024
Browse files
Make the server random by default (#493)
parent
7d1ebc2d
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
11 additions
and
4 deletions
+11
-4
docs/hyperparameter_tuning.md
docs/hyperparameter_tuning.md
+6
-3
python/sglang/srt/server_args.py
python/sglang/srt/server_args.py
+5
-1
No files found.
docs/hyperparameter_tuning.md
View file @
159cc741
...
...
@@ -22,10 +22,13 @@ On the other hand, if you see `token usage` very high and you frequently see war
### Tune `--dp-size` and `--tp-size`
Data parallelism is better for throughput. When there is enough GPU memory, always favor data parallelism for throughput.
### (Minor) Tune `--max-prefill-tokens`, `--mem-fraction-static`, `--max-running-requests`.
If you see out of memory (OOM) errors, you can decrease these parameters.
If OOM happens during prefill, try to decrease `--max-prefill-tokens`.
If OOM happens during decoding, try to decrease `--max-running-requests`.
You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
### (Minor) Tune `--schedule-heuristic`
If you have many shared prefixes, use the default `--schedule-heuristic lpm`. `lpm` stands for longest prefix match.
When you have no shared prefixes at all or you always send the requests with the shared prefixes together,
you can try `--schedule-heuristic fcfs`. `fcfs` stands for first come first serve.
### (Minor) Tune `--max-prefill-tokens`, `--mem-fraction-static`, `--max-running-requests`.
If you see out of memory errors, you can decrease them. Otherwise, the default value should work well.
python/sglang/srt/server_args.py
View file @
159cc741
...
...
@@ -2,6 +2,7 @@
import
argparse
import
dataclasses
import
random
from
typing
import
List
,
Optional
,
Union
...
...
@@ -32,7 +33,7 @@ class ServerArgs:
# Other runtime options
tp_size
:
int
=
1
stream_interval
:
int
=
8
random_seed
:
int
=
42
random_seed
:
Optional
[
int
]
=
None
# Logging
log_level
:
str
=
"info"
...
...
@@ -72,6 +73,9 @@ class ServerArgs:
elif
self
.
additional_ports
is
None
:
self
.
additional_ports
=
[]
if
self
.
random_seed
is
None
:
self
.
random_seed
=
random
.
randint
(
0
,
1
<<
30
)
@
staticmethod
def
add_cli_args
(
parser
:
argparse
.
ArgumentParser
):
parser
.
add_argument
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment