Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
7e357113
Unverified
Commit
7e357113
authored
May 09, 2025
by
Mark McLoughlin
Committed by
GitHub
May 09, 2025
Browse files
[V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908)
Signed-off-by:
Mark McLoughlin
<
markmc@redhat.com
>
parent
ea2236bf
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
5 deletions
+9
-5
examples/offline_inference/eagle.py
examples/offline_inference/eagle.py
+2
-2
vllm/v1/spec_decode/metrics.py
vllm/v1/spec_decode/metrics.py
+7
-3
No files found.
examples/offline_inference/eagle.py
View file @
7e357113
...
@@ -118,8 +118,8 @@ def main():
...
@@ -118,8 +118,8 @@ def main():
acceptance_counts
[
step
]
+=
count
acceptance_counts
[
step
]
+=
count
print
(
"-"
*
50
)
print
(
"-"
*
50
)
print
(
f
"mean acceptance length:
\
print
(
f
"mean acceptance length
(including bonus tokens)
:
\
{
sum
(
acceptance_counts
)
/
acceptance_counts
[
0
]:.
2
f
}
"
)
{
1
+
(
sum
(
acceptance_counts
)
/
acceptance_counts
[
0
]
)
:.
2
f
}
"
)
print
(
"-"
*
50
)
print
(
"-"
*
50
)
# print acceptance at each token position
# print acceptance at each token position
...
...
vllm/v1/spec_decode/metrics.py
View file @
7e357113
...
@@ -73,7 +73,9 @@ class SpecDecodingLogging:
...
@@ -73,7 +73,9 @@ class SpecDecodingLogging:
draft_acceptance_rate
=
(
num_accepted_tokens
/
num_draft_tokens
*
draft_acceptance_rate
=
(
num_accepted_tokens
/
num_draft_tokens
*
100
if
num_draft_tokens
>
0
else
float
(
"nan"
))
100
if
num_draft_tokens
>
0
else
float
(
"nan"
))
mean_acceptance_length
=
(
num_accepted_tokens
/
num_drafts
)
# Conventionally, mean acceptance length includes the bonus token
mean_acceptance_length
=
1
+
(
num_accepted_tokens
/
num_drafts
)
pos_matrix
=
np
.
array
(
self
.
accepted_tokens_per_pos_lists
)
pos_matrix
=
np
.
array
(
self
.
accepted_tokens_per_pos_lists
)
acceptance_rates
=
np
.
sum
(
pos_matrix
,
axis
=
0
)
/
num_drafts
acceptance_rates
=
np
.
sum
(
pos_matrix
,
axis
=
0
)
/
num_drafts
...
@@ -103,10 +105,12 @@ class SpecDecodingProm:
...
@@ -103,10 +105,12 @@ class SpecDecodingProm:
rate(vllm:spec_decode_num_accepted_tokens_total[$interval]) /
rate(vllm:spec_decode_num_accepted_tokens_total[$interval]) /
rate(vllm:spec_decode_num_draft_tokens_total[$interval])
rate(vllm:spec_decode_num_draft_tokens_total[$interval])
The mean acceptance length can be calculated using:
The mean acceptance length (conventionally including bonus tokens)
can be calculated using:
1 + (
rate(vllm:spec_decode_num_accepted_tokens_total[$interval]) /
rate(vllm:spec_decode_num_accepted_tokens_total[$interval]) /
rate(vllm:spec_decode_num_drafts[$interval])
rate(vllm:spec_decode_num_drafts[$interval])
)
A per-position acceptance rate vector can be computed using
A per-position acceptance rate vector can be computed using
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment