Unverified Commit 503d042a authored by GuanLuo's avatar GuanLuo Committed by GitHub
Browse files

fix: efficiently fill dummy data to bypass vLLM preprocessor in E/P/D (#6968)


Signed-off-by: default avatarGuan Luo <41310872+GuanLuo@users.noreply.github.com>
parent f99b78f0
...@@ -271,10 +271,10 @@ def construct_qwen_decode_mm_data( ...@@ -271,10 +271,10 @@ def construct_qwen_decode_mm_data(
# WAR: Use request_id hash as seed for unique placeholder values. # WAR: Use request_id hash as seed for unique placeholder values.
# This prevents prefix cache from incorrectly matching different images # This prevents prefix cache from incorrectly matching different images
# that happen to have the same dimensions (same image_grid_thw). # that happen to have the same dimensions (same image_grid_thw).
seed = hash(request_id) & 0xFFFFFFFF # Convert to positive 32-bit int # bit ops to convert request ID to somewhat unique value that fits in the dtype range
generator = torch.Generator().manual_seed(seed) fill_value = hash(request_id) & ((1 << (dtype.itemsize * 8)) - 1)
image_embeds = torch.randn( image_embeds = torch.full(
embeddings_shape, dtype=dtype, device="cpu", generator=generator embeddings_shape, fill_value=fill_value, dtype=dtype, device="cpu"
) )
if image_embeds.ndim == 3: if image_embeds.ndim == 3:
image_embeds = image_embeds.squeeze(0) image_embeds = image_embeds.squeeze(0)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment