chore(multimodal): add notes/warn about encoder disagg not supporting video_url yet (#7890)

210bbf5d · Ryan McCormick · GitHub · 3292ed1b · 210bbf5d · 210bbf5d
Unverified Commit 210bbf5d authored Apr 06, 2026 by Ryan McCormick Committed by GitHub Apr 06, 2026
3 changed files
--- a/components/src/dynamo/vllm/handlers.py
+++ b/components/src/dynamo/vllm/handlers.py
@@ -419,6 +419,11 @@ class BaseWorkerHandler(ABC, Generic[RequestT, ResponseT]):
        # Without encode worker, the embedding will be generated internally by vLLM.
        if encode_worker_client is None:
            return None
+        logger.warning(
+            "Separate multimodal encode-worker routing only applies to image_url "
+            "inputs. video_url inputs are not sent to the encode worker and will "
+            "be processed on the prefill/PD worker instead."
+        )
        # Embedding loader consist of two main components:
        # 1) An remote encode worker client and matching embedding receiver,
        #    which can request remote encode and handle the transfer of embeddings

--- a/docs/features/multimodal/encoder-disaggregation.md
+++ b/docs/features/multimodal/encoder-disaggregation.md
@@ -29,7 +29,7 @@ For simple deployments or development/testing, the aggregated (EPD) pattern is e

 | Backend | E/PD | E/P/D | Notes |
 |---------|------|-------|-------|
-| **vLLM** | ✅ | ✅ | NIXL transfer for embeddings; NIXL KV cache transfer for P/D |
+| **vLLM** | ✅ | ✅ | Separate encode worker currently handles `image_url` inputs; `video_url` inputs stay on the prefill/PD path |
 | **TRT-LLM** | ❌ | ✅ | Supports image URLs (via `MultimodalEncoder`) and pre-computed embeddings (via NIXL) |
 | **SGLang** | ✅ | ✅ | NIXL for embeddings; bootstrap mechanism for P/D KV transfer |


--- a/docs/features/multimodal/multimodal-vllm.md
+++ b/docs/features/multimodal/multimodal-vllm.md
@@ -109,7 +109,11 @@ curl http://localhost:8000/v1/chat/completions \

 ### E/PD Serving (Encode + PD)

-Use `disagg_multimodal_e_pd.sh` when you want a separate encode worker and a combined prefill/decode worker. This path is primarily useful for image-centric workloads and embedding-cache experiments; use `agg_multimodal.sh` or `disagg_multimodal_epd.sh` for general video serving.
+Use `disagg_multimodal_e_pd.sh` when you want a separate encode worker and a combined prefill/decode worker. This path is primarily useful for image-centric workloads and embedding-cache experiments.
+
+<Warning>
+When a separate encode worker is deployed with the current vLLM path, only `image_url` inputs are routed to it. `video_url` inputs are still processed on the combined PD worker.
+</Warning>

 ```bash
 cd $DYNAMO_HOME/examples/backends/vllm
@@ -124,7 +128,11 @@ bash launch/disagg_multimodal_e_pd.sh --model Qwen/Qwen3-VL-2B-Instruct --single

 ### E/P/D Serving (Full Disaggregation)

-Use the full disaggregated launcher when you want separate encode, prefill, and decode workers for image/video workloads:
+Use `disagg_multimodal_epd.sh` when you want separate encode, prefill, and decode workers for multimodal workloads.
+
+<Warning>
+In the current vLLM implementation, the separate encode worker is only used for `image_url` inputs. `video_url` inputs are still processed on the prefill worker, not on the encode worker.
+</Warning>

 ```bash
 cd $DYNAMO_HOME/examples/backends/vllm