[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP (#17916)

Signed-off-by: Jin Huang <jinhun@amazon.com> Co-authored-by: Jin Huang <jinhun@amazon.com>

[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP (#17916)
Signed-off-by: Jin Huang <jinhun@amazon.com> Co-authored-by: Jin Huang <jinhun@amazon.com>
8dd0671b · Jin Huang · GitHub · f0d610a8 · 8dd0671b
Unverified Commit 8dd0671b authored May 13, 2025 by Jin Huang Committed by GitHub May 13, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/v1/worker/gpu_model_runner.py vllm/v1/worker/gpu_model_runner.py +1 -1

No files found.
--- a/vllm/v1/worker/gpu_model_runner.py
+++ b/vllm/v1/worker/gpu_model_runner.py
@@ -1107,7 +1107,7 @@ class GPUModelRunner(LoRAModelRunnerMixin):
        else:
            mm_embeds = []

-        if self.is_multimodal_model:
+        if self.is_multimodal_model and get_pp_group().is_first_rank:
            # NOTE(woosuk): To unify token ids and soft tokens (vision
            # embeddings), we always use embeddings (rather than token ids)
            # as input to the multimodal model, even when the input is text.