Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
159cc741
"vscode:/vscode.git/clone" did not exist on "c5ae54bfe67f17e497f97f54aed773a2616b778f"
Unverified
Commit
159cc741
authored
May 31, 2024
by
Lianmin Zheng
Committed by
GitHub
May 31, 2024
Browse files
Make the server random by default (#493)
parent
7d1ebc2d
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
11 additions
and
4 deletions
+11
-4
docs/hyperparameter_tuning.md
docs/hyperparameter_tuning.md
+6
-3
python/sglang/srt/server_args.py
python/sglang/srt/server_args.py
+5
-1
No files found.
docs/hyperparameter_tuning.md
View file @
159cc741
...
@@ -22,10 +22,13 @@ On the other hand, if you see `token usage` very high and you frequently see war
...
@@ -22,10 +22,13 @@ On the other hand, if you see `token usage` very high and you frequently see war
### Tune `--dp-size` and `--tp-size`
### Tune `--dp-size` and `--tp-size`
Data parallelism is better for throughput. When there is enough GPU memory, always favor data parallelism for throughput.
Data parallelism is better for throughput. When there is enough GPU memory, always favor data parallelism for throughput.
### (Minor) Tune `--max-prefill-tokens`, `--mem-fraction-static`, `--max-running-requests`.
If you see out of memory (OOM) errors, you can decrease these parameters.
If OOM happens during prefill, try to decrease `--max-prefill-tokens`.
If OOM happens during decoding, try to decrease `--max-running-requests`.
You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
### (Minor) Tune `--schedule-heuristic`
### (Minor) Tune `--schedule-heuristic`
If you have many shared prefixes, use the default `--schedule-heuristic lpm`. `lpm` stands for longest prefix match.
If you have many shared prefixes, use the default `--schedule-heuristic lpm`. `lpm` stands for longest prefix match.
When you have no shared prefixes at all or you always send the requests with the shared prefixes together,
When you have no shared prefixes at all or you always send the requests with the shared prefixes together,
you can try `--schedule-heuristic fcfs`. `fcfs` stands for first come first serve.
you can try `--schedule-heuristic fcfs`. `fcfs` stands for first come first serve.
### (Minor) Tune `--max-prefill-tokens`, `--mem-fraction-static`, `--max-running-requests`.
If you see out of memory errors, you can decrease them. Otherwise, the default value should work well.
python/sglang/srt/server_args.py
View file @
159cc741
...
@@ -2,6 +2,7 @@
...
@@ -2,6 +2,7 @@
import
argparse
import
argparse
import
dataclasses
import
dataclasses
import
random
from
typing
import
List
,
Optional
,
Union
from
typing
import
List
,
Optional
,
Union
...
@@ -32,7 +33,7 @@ class ServerArgs:
...
@@ -32,7 +33,7 @@ class ServerArgs:
# Other runtime options
# Other runtime options
tp_size
:
int
=
1
tp_size
:
int
=
1
stream_interval
:
int
=
8
stream_interval
:
int
=
8
random_seed
:
int
=
42
random_seed
:
Optional
[
int
]
=
None
# Logging
# Logging
log_level
:
str
=
"info"
log_level
:
str
=
"info"
...
@@ -72,6 +73,9 @@ class ServerArgs:
...
@@ -72,6 +73,9 @@ class ServerArgs:
elif
self
.
additional_ports
is
None
:
elif
self
.
additional_ports
is
None
:
self
.
additional_ports
=
[]
self
.
additional_ports
=
[]
if
self
.
random_seed
is
None
:
self
.
random_seed
=
random
.
randint
(
0
,
1
<<
30
)
@
staticmethod
@
staticmethod
def
add_cli_args
(
parser
:
argparse
.
ArgumentParser
):
def
add_cli_args
(
parser
:
argparse
.
ArgumentParser
):
parser
.
add_argument
(
parser
.
add_argument
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment