[BugFix][Spec Decode] Use float64 for uniform_probs (#23803)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[BugFix][Spec Decode] Use float64 for uniform_probs (#23803)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
a3432f18 · Woosuk Kwon · GitHub · 67cee40d · a3432f18 · a3432f18
Unverified Commit a3432f18 authored Aug 28, 2025 by Woosuk Kwon Committed by GitHub Aug 28, 2025
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 2 deletions

examples/offline_inference/spec_decode.py examples/offline_inference/spec_decode.py +1 -1

vllm/v1/sample/rejection_sampler.py vllm/v1/sample/rejection_sampler.py +6 -1

No files found.
--- a/examples/offline_inference/spec_decode.py
+++ b/examples/offline_inference/spec_decode.py
@@ -138,7 +138,7 @@ def main():
    sampling_params = SamplingParams(temperature=args.temp, max_tokens=args.output_len)
    if not args.custom_mm_prompts:
        outputs = llm.generate(
-            TokensPrompt(prompt_token_ids=prompt_ids),
+            [TokensPrompt(prompt_token_ids=x) for x in prompt_ids],
            sampling_params=sampling_params,
        )
    else:

--- a/vllm/v1/sample/rejection_sampler.py
+++ b/vllm/v1/sample/rejection_sampler.py
@@ -365,9 +365,14 @@ def generate_uniform_probs(
            A tensor of shape `(num_tokens, )` containing uniform
            random values in the range [0, 1).
    """
+    # NOTE(woosuk): We deliberately use float64 instead of float32 here
+    # because when using float32, there's a non-negligible chance that
+    # uniform_prob is sampled to be exact 0.0 as reported in
+    # https://github.com/pytorch/pytorch/issues/16706. Using float64
+    # mitigates the issue.
    uniform_probs = torch.rand(
        (num_tokens, ),
-        dtype=torch.float32,
+        dtype=torch.float64,
        device=device,
    )
    start_idx = 0