Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
2d3ae4e1
Unverified
Commit
2d3ae4e1
authored
Jul 25, 2024
by
Yineng Zhang
Committed by
GitHub
Jul 25, 2024
Browse files
docs: update doc (#713)
parent
75f4ccb7
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
5 deletions
+8
-5
benchmark/blog_v0_2/README.md
benchmark/blog_v0_2/README.md
+8
-5
No files found.
benchmark/blog_v0_2/README.md
View file @
2d3ae4e1
...
@@ -29,6 +29,9 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruc
...
@@ -29,6 +29,9 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruc
# Meta-Llama-3.1-70B-Instruct
# Meta-Llama-3.1-70B-Instruct
python
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-70B-Instruct
--disable-radix-cache
--tp
8
python
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-70B-Instruct
--disable-radix-cache
--tp
8
# Meta-Llama-3-70B-Instruct-FP8
python
-m
sglang.launch_server
--model-path
neuralmagic/Meta-Llama-3-70B-Instruct-FP8
--disable-radix-cache
--tp
8
```
```
## Benchmark
## Benchmark
...
@@ -59,19 +62,19 @@ cat sglang_offline_benchmark.jsonl | cut -d':' -f12 | cut -d',' -f1
...
@@ -59,19 +62,19 @@ cat sglang_offline_benchmark.jsonl | cut -d':' -f12 | cut -d',' -f1
#### Online benchmark
#### Online benchmark
```
bash
```
bash
# Random dataset, Input [
1024
, 4096], Output [
256
, 1024], request rate 1, num prompts 300
# Random dataset, Input [
512
, 4096], Output [
128
, 1024], request rate 1, num prompts 300
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
300
--request-rate
1
--output-file
sglang_online_benchmark.jsonl
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
300
--request-rate
1
--output-file
sglang_online_benchmark.jsonl
# Random dataset, Input [
1024
, 4096], Output [
256
, 1024], request rate 2, num prompts 600
# Random dataset, Input [
512
, 4096], Output [
128
, 1024], request rate 2, num prompts 600
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
600
--request-rate
2
--output-file
sglang_online_benchmark.jsonl
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
600
--request-rate
2
--output-file
sglang_online_benchmark.jsonl
# Random dataset, Input [
1024
, 4096], Output [
256
, 1024], request rate 4, num prompts 1200
# Random dataset, Input [
512
, 4096], Output [
128
, 1024], request rate 4, num prompts 1200
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
1200
--request-rate
4
--output-file
sglang_online_benchmark.jsonl
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
1200
--request-rate
4
--output-file
sglang_online_benchmark.jsonl
# Random dataset, Input [
1024
, 4096], Output [
256
, 1024], request rate 8, num prompts 2400
# Random dataset, Input [
512
, 4096], Output [
128
, 1024], request rate 8, num prompts 2400
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
2400
--request-rate
8
--output-file
sglang_online_benchmark.jsonl
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
2400
--request-rate
8
--output-file
sglang_online_benchmark.jsonl
# Random dataset, Input [
1024
, 4096], Output [
256
, 1024], request rate 16, num prompts 3200
# Random dataset, Input [
512
, 4096], Output [
128
, 1024], request rate 16, num prompts 3200
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
3200
--request-rate
16
--output-file
sglang_online_benchmark.jsonl
python3
-m
sglang.bench_serving
--backend
sglang
--dataset-name
random
--random-input
4096
--random-output
1024
--random-range-ratio
0.125
--num-prompts
3200
--request-rate
16
--output-file
sglang_online_benchmark.jsonl
# get median e2e latency
# get median e2e latency
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment