[BugFix] Fix mixed penalties batch with async scheduling (#27910)

Signed-off-by: Nick Hill <nhill@redhat.com>

[BugFix] Fix mixed penalties batch with async scheduling (#27910)
Signed-off-by: Nick Hill <nhill@redhat.com>
c2ed069b · Nick Hill · GitHub · af6e19f5 · c2ed069b
Unverified Commit c2ed069b authored Nov 01, 2025 by Nick Hill Committed by GitHub Nov 01, 2025
Show whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

vllm/v1/sample/ops/penalties.py vllm/v1/sample/ops/penalties.py +8 -0

No files found.
--- a/vllm/v1/sample/ops/penalties.py
+++ b/vllm/v1/sample/ops/penalties.py
@@ -21,6 +21,14 @@ def apply_all_penalties(
    """
    _, vocab_size = logits.shape
    output_tokens_t = _convert_to_tensors(output_token_ids, vocab_size, logits.device)
+
+    # In the async scheduling case, rows that won't have penalties applied may contain
+    # -1 placeholder token ids. We must replace these with valid token ids so that the
+    # scatter done in apply_penalties is valid.
+    # NOTE(nick): The penalties implementation is currently quite inefficient and
+    # will be reworked anyhow.
+    output_tokens_t.masked_fill_(output_tokens_t == -1, vocab_size)
+
    return apply_penalties(
        logits,
        prompt_token_ids,