The `--output-modalities` flag determines which endpoint(s) the worker registers. When set to `image`, both `/v1/chat/completions` (returns inline base64 images) and `/v1/images/generations` are available. When set to `video`, the worker serves `/v1/videos`.
The `--output-modalities` flag determines which endpoint(s) the worker registers. When set to `image`, both `/v1/chat/completions` (returns inline base64 images) and `/v1/images/generations` are available. When set to `video`, the worker serves `/v1/videos`.
...
@@ -35,6 +36,7 @@ The `--output-modalities` flag determines which endpoint(s) the worker registers
...
@@ -35,6 +36,7 @@ The `--output-modalities` flag determines which endpoint(s) the worker registers
Image-to-video (I2V) uses the same `/v1/videos` endpoint as text-to-video, with an additional `input_reference` field that provides the source image. The image can be an HTTP URL, a base64 data URI, or a local file path.
Launch with the provided script using `Wan-AI/Wan2.2-TI2V-5B-Diffusers`:
-**Base64 data URI**: `"data:image/png;base64,iVBORw0KGgo..."`
-**Local file path**: `"/path/to/image.png"` or `"file:///path/to/image.png"`
The I2V-specific `nvext` fields (`boundary_ratio`, `guidance_scale_2`) control the dual-expert MoE denoising schedule in Wan2.x models. See [Wan2.2-I2V model card](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers) for details.
## CLI Reference
## CLI Reference
...
@@ -192,6 +235,6 @@ Omni pipelines are configured via YAML stage configs. See [`examples/backends/vl
...
@@ -192,6 +235,6 @@ Omni pipelines are configured via YAML stage configs. See [`examples/backends/vl
## Current Limitations
## Current Limitations
-Only text prompts are supported as input (no multimodal input yet).
-Image input is supported only for I2V via `input_reference` in `/v1/videos`. Other endpoints accept text prompts only.
- KV cache events are not published for omni workers.
- KV cache events are not published for omni workers.
- Each worker supports a single output modality at a time.
- Each worker supports a single output modality at a time.