"components/vscode:/vscode.git/clone" did not exist on "d688aa68e5a0827cc10dfe127e124bee66aab0ea"
Unverified Commit 210bbf5d authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

chore(multimodal): add notes/warn about encoder disagg not supporting video_url yet (#7890)

parent 3292ed1b
......@@ -419,6 +419,11 @@ class BaseWorkerHandler(ABC, Generic[RequestT, ResponseT]):
# Without encode worker, the embedding will be generated internally by vLLM.
if encode_worker_client is None:
return None
logger.warning(
"Separate multimodal encode-worker routing only applies to image_url "
"inputs. video_url inputs are not sent to the encode worker and will "
"be processed on the prefill/PD worker instead."
)
# Embedding loader consist of two main components:
# 1) An remote encode worker client and matching embedding receiver,
# which can request remote encode and handle the transfer of embeddings
......
......@@ -29,7 +29,7 @@ For simple deployments or development/testing, the aggregated (EPD) pattern is e
| Backend | E/PD | E/P/D | Notes |
|---------|------|-------|-------|
| **vLLM** | ✅ | ✅ | NIXL transfer for embeddings; NIXL KV cache transfer for P/D |
| **vLLM** | ✅ | ✅ | Separate encode worker currently handles `image_url` inputs; `video_url` inputs stay on the prefill/PD path |
| **TRT-LLM** | ❌ | ✅ | Supports image URLs (via `MultimodalEncoder`) and pre-computed embeddings (via NIXL) |
| **SGLang** | ✅ | ✅ | NIXL for embeddings; bootstrap mechanism for P/D KV transfer |
......
......@@ -109,7 +109,11 @@ curl http://localhost:8000/v1/chat/completions \
### E/PD Serving (Encode + PD)
Use `disagg_multimodal_e_pd.sh` when you want a separate encode worker and a combined prefill/decode worker. This path is primarily useful for image-centric workloads and embedding-cache experiments; use `agg_multimodal.sh` or `disagg_multimodal_epd.sh` for general video serving.
Use `disagg_multimodal_e_pd.sh` when you want a separate encode worker and a combined prefill/decode worker. This path is primarily useful for image-centric workloads and embedding-cache experiments.
<Warning>
When a separate encode worker is deployed with the current vLLM path, only `image_url` inputs are routed to it. `video_url` inputs are still processed on the combined PD worker.
</Warning>
```bash
cd $DYNAMO_HOME/examples/backends/vllm
......@@ -124,7 +128,11 @@ bash launch/disagg_multimodal_e_pd.sh --model Qwen/Qwen3-VL-2B-Instruct --single
### E/P/D Serving (Full Disaggregation)
Use the full disaggregated launcher when you want separate encode, prefill, and decode workers for image/video workloads:
Use `disagg_multimodal_epd.sh` when you want separate encode, prefill, and decode workers for multimodal workloads.
<Warning>
In the current vLLM implementation, the separate encode worker is only used for `image_url` inputs. `video_url` inputs are still processed on the prefill worker, not on the encode worker.
</Warning>
```bash
cd $DYNAMO_HOME/examples/backends/vllm
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment