fix: efficiently fill dummy data to bypass vLLM preprocessor in E/P/D (#6968)

Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com>

fix: efficiently fill dummy data to bypass vLLM preprocessor in E/P/D (#6968)
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com>
503d042a · GuanLuo · GitHub · f99b78f0 · 503d042a
Unverified Commit 503d042a authored Mar 05, 2026 by GuanLuo Committed by GitHub Mar 06, 2026
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 4 deletions

components/src/dynamo/vllm/multimodal_utils/model.py components/src/dynamo/vllm/multimodal_utils/model.py +4 -4

No files found.
--- a/components/src/dynamo/vllm/multimodal_utils/model.py
+++ b/components/src/dynamo/vllm/multimodal_utils/model.py
@@ -271,10 +271,10 @@ def construct_qwen_decode_mm_data(
    # WAR: Use request_id hash as seed for unique placeholder values.
    # This prevents prefix cache from incorrectly matching different images
    # that happen to have the same dimensions (same image_grid_thw).
-    seed = hash(request_id) & 0xFFFFFFFF  # Convert to positive 32-bit int
+    # bit ops to convert request ID to somewhat unique value that fits in the dtype range
-    generator = torch.Generator().manual_seed(seed)
+    fill_value = hash(request_id) & ((1 << (dtype.itemsize * 8)) - 1)
-    image_embeds = torch.randn(
+    image_embeds = torch.full(
-        embeddings_shape, dtype=dtype, device="cpu", generator=generator
+        embeddings_shape, fill_value=fill_value, dtype=dtype, device="cpu"
    )
    if image_embeds.ndim == 3:
        image_embeds = image_embeds.squeeze(0)