@@ -11,7 +11,6 @@ Dynamo supports multimodal inference across multiple LLM backends, enabling mode
...
@@ -11,7 +11,6 @@ Dynamo supports multimodal inference across multiple LLM backends, enabling mode
**Security Requirement**: Multimodal processing must be explicitly enabled at startup. See the relevant backend documentation ([vLLM](multimodal-vllm.md), [SGLang](multimodal-sglang.md), [TRT-LLM](multimodal-trtllm.md)) for the necessary flags. This prevents unintended processing of multimodal data from untrusted sources.
**Security Requirement**: Multimodal processing must be explicitly enabled at startup. See the relevant backend documentation ([vLLM](multimodal-vllm.md), [SGLang](multimodal-sglang.md), [TRT-LLM](multimodal-trtllm.md)) for the necessary flags. This prevents unintended processing of multimodal data from untrusted sources.
</Warning>
</Warning>
## Key Features
```mermaid
```mermaid
---
---
title: Sample flow for an aggregated VLM serving scenario
title: Sample flow for an aggregated VLM serving scenario
...
@@ -28,6 +27,9 @@ flowchart TD
...
@@ -28,6 +27,9 @@ flowchart TD
C --> I[DECODE]
C --> I[DECODE]
H --> I
H --> I
I --> J[Response]
I --> J[Response]
```
## Key Features
Dynamo provides support for improving latency and throughput for vision-and-language workloads through the following features, that can be used together or separately, depending on your workload characteristics:
Dynamo provides support for improving latency and throughput for vision-and-language workloads through the following features, that can be used together or separately, depending on your workload characteristics:
| Feature | Description |
| Feature | Description |
...
@@ -46,14 +48,6 @@ Dynamo provides support for improving latency and throughput for vision-and-lang
...
@@ -46,14 +48,6 @@ Dynamo provides support for improving latency and throughput for vision-and-lang
**Status:** ✅ Supported | 🧪 Experimental | ❌ Not supported
**Status:** ✅ Supported | 🧪 Experimental | ❌ Not supported
### Input Format Support
| Format | SGLang | TRT-LLM | vLLM |
|--------|--------|---------|------|
| HTTP/HTTPS URL | ✅ | ✅ | ✅ |
| Data URL (Base64) | ❌ | ❌ | ✅ |
| Pre-computed Embeddings (.pt) | ❌ | ✅ | ❌ |
## Example Workflows
## Example Workflows
Reference implementations for deploying multimodal models:
Reference implementations for deploying multimodal models: