Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
1ece2cda
"git@developer.sourcefind.cn:OpenDAS/torch-sparce.git" did not exist on "0e2ddfad3f64e4925878f8fd63b6ad4334ce6334"
Unverified
Commit
1ece2cda
authored
Aug 28, 2024
by
Liangsheng Yin
Committed by
GitHub
Aug 28, 2024
Browse files
Fix bench latency benchmark (#1225)
parent
c8a9e791
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
6 deletions
+9
-6
.github/workflows/e2e-test.yml
.github/workflows/e2e-test.yml
+5
-0
python/sglang/bench_latency.py
python/sglang/bench_latency.py
+4
-6
No files found.
.github/workflows/e2e-test.yml
View file @
1ece2cda
...
@@ -38,6 +38,11 @@ jobs:
...
@@ -38,6 +38,11 @@ jobs:
cd test/srt
cd test/srt
python3 -m unittest test_serving_throughput.TestServingThroughput.test_default
python3 -m unittest test_serving_throughput.TestServingThroughput.test_default
-
name
:
Benchmark Serving Latency
timeout-minutes
:
10
run
:
|
python3 -m sglang.bench_latency --model meta-llama/Meta-Llama-3.1-8B-Instruct --batch-size 1 --input 128 --output 8
-
name
:
Benchmark Serving Throughput (w/o RadixAttention)
-
name
:
Benchmark Serving Throughput (w/o RadixAttention)
timeout-minutes
:
10
timeout-minutes
:
10
run
:
|
run
:
|
...
...
python/sglang/bench_latency.py
View file @
1ece2cda
...
@@ -200,16 +200,14 @@ def extend(reqs, model_runner):
...
@@ -200,16 +200,14 @@ def extend(reqs, model_runner):
tree_cache
=
None
,
tree_cache
=
None
,
)
)
batch
.
prepare_for_extend
(
model_runner
.
model_config
.
vocab_size
)
batch
.
prepare_for_extend
(
model_runner
.
model_config
.
vocab_size
)
output
=
model_runner
.
forward
(
batch
,
ForwardMode
.
EXTEND
)
sample_output
,
logits_output
=
model_runner
.
forward
(
batch
,
ForwardMode
.
EXTEND
)
next_token_ids
=
batch
.
sample
(
output
.
next_token_logits
)
return
sample_output
.
batch_next_token_ids
,
logits_output
.
next_token_logits
,
batch
return
next_token_ids
,
output
.
next_token_logits
,
batch
def
decode
(
input_token_ids
,
batch
,
model_runner
):
def
decode
(
input_token_ids
,
batch
,
model_runner
):
batch
.
prepare_for_decode
(
input_token_ids
.
cpu
().
numpy
())
batch
.
prepare_for_decode
(
input_token_ids
.
cpu
().
numpy
())
output
=
model_runner
.
forward
(
batch
,
ForwardMode
.
DECODE
)
sample_output
,
logits_output
=
model_runner
.
forward
(
batch
,
ForwardMode
.
DECODE
)
next_token_ids
=
batch
.
sample
(
output
.
next_token_logits
)
return
sample_output
.
batch_next_token_ids
,
logits_output
.
next_token_logits
return
next_token_ids
,
output
.
next_token_logits
@
torch
.
inference_mode
()
@
torch
.
inference_mode
()
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment