Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
4d086719
Unverified
Commit
4d086719
authored
Oct 06, 2024
by
HAI
Committed by
GitHub
Oct 06, 2024
Browse files
[Bug] Fix decode stats error on output_len 1 (#1585)
parent
9244f27f
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
8 deletions
+11
-8
python/sglang/bench_latency.py
python/sglang/bench_latency.py
+11
-8
No files found.
python/sglang/bench_latency.py
View file @
4d086719
...
@@ -340,13 +340,16 @@ def latency_test_run_once(
...
@@ -340,13 +340,16 @@ def latency_test_run_once(
rank_print
(
rank_print
(
f
"Decode. latency:
{
latency
:
6.5
f
}
s, throughput:
{
throughput
:
9.2
f
}
token/s"
f
"Decode. latency:
{
latency
:
6.5
f
}
s, throughput:
{
throughput
:
9.2
f
}
token/s"
)
)
med_decode_latency
=
np
.
median
(
decode_latencies
)
med_decode_throughput
=
batch_size
/
med_decode_latency
# record decode timing from 2nd output
rank_print
(
if
output_len
>
1
:
f
"Decode. median latency:
{
med_decode_latency
:
6.5
f
}
s, median throughput:
{
med_decode_throughput
:
9.2
f
}
token/s"
med_decode_latency
=
np
.
median
(
decode_latencies
)
)
med_decode_throughput
=
batch_size
/
med_decode_latency
measurement_results
[
"median_decode_latency"
]
=
med_decode_latency
rank_print
(
measurement_results
[
"median_decode_throughput"
]
=
med_decode_throughput
f
"Decode. median latency:
{
med_decode_latency
:
6.5
f
}
s, median throughput:
{
med_decode_throughput
:
9.2
f
}
token/s"
)
measurement_results
[
"median_decode_latency"
]
=
med_decode_latency
measurement_results
[
"median_decode_throughput"
]
=
med_decode_throughput
throughput
=
(
input_len
+
output_len
)
*
batch_size
/
tot_latency
throughput
=
(
input_len
+
output_len
)
*
batch_size
/
tot_latency
rank_print
(
rank_print
(
...
@@ -382,7 +385,7 @@ def latency_test(
...
@@ -382,7 +385,7 @@ def latency_test(
reqs
,
reqs
,
bench_args
.
batch_size
[
0
],
bench_args
.
batch_size
[
0
],
bench_args
.
input_len
[
0
],
bench_args
.
input_len
[
0
],
4
,
# shorter decoding to speed up the warmup
8
,
# shorter decoding to speed up the warmup
)
)
rank_print
(
"Benchmark ..."
)
rank_print
(
"Benchmark ..."
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment