f"Request {request_id}: Generating text for {len(token_ids)} input tokens"
"""
)
This mode returns OpenAI-compatible streaming chunks
Text input -> Text output / Image output
"""
# (ayushag) TODO: Support all type of OmniPrompt. Right now it works for only text prompts
# (ayushag) TODO: Document all I/O formats from vllm omni
# OmniText prompt support additional negative prompts as well. need to support that as well.
# Support multimodal content as well. That will involve applying tokenizer to the prompt and loading images. Follow general multimodal support pattern.
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
# [Experimental] Running Omni Models with vLLM
Dynamo supports omni (multimodal generation) models via the [vLLM-Omni](https://github.com/vllm-project/vllm-omni) backend. This enables multi-stage pipelines for tasks like text-to-text and text-to-image generation through an OpenAI-compatible API.
## Prerequisites
This guide assumes familiarity with deploying Dynamo with vLLM as described in [README.md](/docs/backends/vllm/README.md).
## Quick Start
### Text-to-Text
Launch an aggregated deployment (frontend + omni worker) using the provided script:
```bash
bash examples/backends/vllm/launch/agg_omni.sh
```
This starts `Qwen/Qwen2.5-Omni-7B` with a single-stage thinker config on one GPU. Override the model with:
curl -X POST http://localhost:8000/v1/chat/completions \
-H"Content-Type: application/json"\
-d'{
"model": "Qwen/Qwen2.5-Omni-7B",
"messages": [{"role": "user", "content": "What is 2+2?"}],
"max_tokens": 50,
"stream": false
}'
```
### Text-to-Image
Text-to-image uses vLLM-Omni's built-in default stage configs (no custom YAML needed). Launch without a stage config path so vLLM-Omni loads the model's default multi-stage pipeline:
```bash
# Start frontend
python -m dynamo.frontend &
# Start omni worker (vLLM-Omni loads default stage configs for the model)
DYN_SYSTEM_PORT=8081 python -m dynamo.vllm \
--model <your-text-to-image-model> \
--omni\
--connector none
```
Images are returned as base64-encoded PNGs in the response:
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H"Content-Type: application/json"\
-d'{
"model": "<your-text-to-image-model>",
"messages": [{"role": "user", "content": "A cat sitting on a windowsill"}],
"stream": false
}'
```
The response contains image data URLs in the content field:
Omni pipelines are configured via YAML stage configs. See [`examples/backends/vllm/launch/stage_configs/single_stage_llm.yaml`](/examples/backends/vllm/launch/stage_configs/single_stage_llm.yaml) for an example. Key fields:
-**`model_stage`**: Pipeline stage name (e.g., `thinker`, `talker`, `code2wav`)
-**`final_output_type`**: Output format — `text` or `image`
-**`is_comprehension`**: Whether this stage processes input text/multimodal content
For full documentation on stage config format, supported fields, and multi-stage pipeline examples, see the [vLLM-Omni Stage Configs documentation](https://docs.vllm.ai/projects/vllm-omni/en/latest/configuration/stage_configs/).
## Current Limitations
- Only text prompts are supported (no multimodal input yet)
- KV cache events are not published for omni workers
"No updates to send for model {}: chat_model_removed: {}, completions_model_removed: {}, embeddings_model_removed: {}, tensor_model_removed: {}, prefill_model_removed: {}",
"No updates to send for model {}: chat_model_removed: {}, completions_model_removed: {}, embeddings_model_removed: {}, tensor_model_removed: {}, images_model_removed: {}, prefill_model_removed: {}",