Unverified Commit e9ec2a72 authored by Ofir Zafrir's avatar Ofir Zafrir Committed by GitHub
Browse files

[Bugfix] Fix stale `common_attn_metadata.max_seq_len` in speculative decoding with Eagle (#32312)


Signed-off-by: default avatarOfir Zafrir <ofir.zafrir@intel.com>
parent 2c9b4cf5
......@@ -466,6 +466,12 @@ class EagleProposer:
# For the requests that exceed the max model length, we set the
# sequence length to 1 to minimize their overheads in attention.
common_attn_metadata.seq_lens.masked_fill_(exceeds_max_model_len, 1)
# Increment the maximum sequence length. We increment max_seq_len
# unconditionally even though some seq_lens may have been capped above,
# as max_seq_len serves as an upper bound for sequence lengths.
common_attn_metadata.max_seq_len = min(
common_attn_metadata.max_seq_len + 1, self.max_model_len
)
# Also update the CPU-side shadow; NOTE: this is hacky and should be
# removed in when common_attn_metadata.seq_lens_cpu is deprecated.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment