[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference (#32384)

Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc>

[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference (#32384)
Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc>
821fde2d · Karan Bansal · GitHub · 8c29042b · 821fde2d
Unverified Commit 821fde2d authored Mar 14, 2026 by Karan Bansal Committed by GitHub Mar 14, 2026
Show whitespace changes
Inline Side-by-side

Showing with 12 additions and 1 deletion

vllm/v1/structured_output/utils.py vllm/v1/structured_output/utils.py +12 -1

No files found.
--- a/vllm/v1/structured_output/utils.py
+++ b/vllm/v1/structured_output/utils.py
@@ -116,6 +116,17 @@ def apply_grammar_bitmask(
        )
        index_tensor = index_tensor.to(logits.device, non_blocking=True)
+    # Handle dtype conversion for CPU (older xgrammar CPU kernels require float32)
+    # See: https://github.com/vllm-project/vllm/issues/31901
+    if logits.device.type == "cpu" and logits.dtype != torch.float32:
+        # Convert to float32, apply bitmask, then convert back
+        logits_float32 = logits.to(torch.float32)
+        xgr.apply_token_bitmask_inplace(
+            logits_float32, grammar_bitmask, indices=index_tensor
+        )
+        # Copy the modified values back to the original tensor
+        logits.copy_(logits_float32.to(logits.dtype))
+    else:
        xgr.apply_token_bitmask_inplace(logits, grammar_bitmask, indices=index_tensor)