This document provides a comprehensive guide for multimodal inference using SGLang backend in Dynamo. SGLang multimodal uses specialized **E/PD or E/P/D** flows with **NIXL (RDMA)** for zero-copy tensor transfer.
This document provides a comprehensive guide for multimodal inference using SGLang backend in Dynamo. SGLang multimodal supports **EPD**, **E/PD**, and **E/P/D** flows, with NIXL (RDMA) for zero-copy tensor transfer in disaggregated modes.
## Support Matrix
...
...
@@ -36,12 +36,12 @@ This document provides a comprehensive guide for multimodal inference using SGLa
## Deployment Patterns
SGLang supports E/PD and E/P/D patterns only (always has a separate encode worker). See [Multimodal Architecture Patterns](index.md#architecture-patterns) for detailed explanations.
SGLang supports EPD, E/PD, and E/P/D patterns. See [Multimodal Architecture Patterns](index.md#architecture-patterns) for detailed explanations.
- worker: [DecodeWorkerHandler](../../components/src/dynamo/sglang/request_handlers/llm/decode_handler.py) handles encoding, prefilling, and decoding in a single process.
### Workflow
The `DecodeWorkerHandler` receives multimodal requests with image URLs and passes them directly to SGLang's engine. SGLang's internal `mm_data_processor` handles image fetching, loading, encoding, and token expansion.
```mermaid
flowchart LR
HTTP --> worker
worker --tokenized text + image_urls--> SGLang[SGLang Engine]