Unverified Commit b2ea5ba6 authored by 7mile's avatar 7mile Committed by GitHub
Browse files

[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked...


[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs (#26231)
Signed-off-by: default avatarseven-mile <i@7li.moe>
Signed-off-by: default avatarBenjamin Chislett <bchislett@nvidia.com>
Co-authored-by: default avatarBenjamin Chislett <bchislett@nvidia.com>
parent 824a3f40
......@@ -522,10 +522,6 @@ class EagleProposer:
)
# Generate a mask for all valid tokens within those requests
max_gen_len = sampled_token_ids.shape[-1]
if max_gen_len == 1:
valid_mask = torch.ones_like(valid_sampled_token_ids_gpu, dtype=torch.bool)
else:
valid_mask = (valid_sampled_token_ids_gpu != -1) & (
valid_sampled_token_ids_gpu < gpu_input_batch.vocab_size
)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment