[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (#22896)

Signed-off-by: JartX <sagformas@epdcenter.es>

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (#22896)
Signed-off-by: JartX <sagformas@epdcenter.es>
68af77e5 · JartX · GitHub · 6b04039a · 68af77e5
Unverified Commit 68af77e5 authored Aug 15, 2025 by JartX Committed by GitHub Aug 15, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 3 deletions

vllm/v1/worker/gpu_model_runner.py vllm/v1/worker/gpu_model_runner.py +4 -3

No files found.
--- a/vllm/v1/worker/gpu_model_runner.py
+++ b/vllm/v1/worker/gpu_model_runner.py
@@ -1337,9 +1337,10 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
        out_indices = []

        # Reorder the bitmask to match the order of the requests in the batch.
-        sorted_bitmask = np.zeros_like(grammar_bitmask,
-                                       shape=(logits.shape[0],
-                                              grammar_bitmask.shape[1]))
+        sorted_bitmask = np.full(shape=(logits.shape[0],
+                                        grammar_bitmask.shape[1]),
+                                 fill_value=-1,
+                                 dtype=grammar_bitmask.dtype)
        cumulative_index = 0
        seq = sorted(scheduler_output.structured_output_request_ids.items(),
                     key=lambda x: x[1])