image_edit_api.md

# Image Edit API

vLLM-Omni provides an OpenAI DALL-E compatible API for image edit using diffusion models.

Each server instance runs a single model (specified at startup via `vllm serve <model> --omni`).

## Quick Start

### Start the Server

For example...

```bash
# Qwen-Image
vllm serve Qwen/Qwen-Image-Edit-2511 --omni --port 8000


### Generate Images

**Using curl:**

```bash
curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
  -X POST "http://localhost:8000/v1/images/edits" \
  -F "model=xxx" \
  -F "image=@./xx.png" \
  -F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
  -F "size=1024x1024" \
  -F "output_format=png"
```


**Using OpenAI SDK:**

```python
import base64
from openai import OpenAI
from pathlib import Path
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8000/v1"
)

input_image_url = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"

result = client.images.edit(
    image=[],
    model="Qwen-Image-Edit-2511",
    prompt="Change the bears in the two input images into walking together.",
    size='512x512',
    stream=False,
    output_format='jpeg',
    # url格式
    extra_body={
        "url": [input_image_url1,input_image_url],
        "num_inference_steps": 50,
        "guidance_scale": 1,
        "seed": 777,
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("edit_out_http.jpeg", "wb") as f:
    f.write(image_bytes)
```

## API Reference

### Endpoint

```
POST /v1/images/edits
Content-Type: multipart/form-data
```

### Request Parameters

#### OpenAI Standard Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | string | **required** | A text description of the desired image |
| `model` | string | server's model | Model to use (optional, should match server if specified) |
| `image` | string or array | **required** | The image(s) to edit. |
| `n` | integer | 1 | Number of images to generate (1-10) |
| `size` | string | "auto" | Image dimensions in WxH format (e.g., "1024x1024", "512x512"), when set to auto, it decide size from first input image. |
| `response_format` | string | "b64_json" | Response format (only "b64_json" supported) |
| `user` | string | null | User identifier for tracking |
| `output_format` | string | "png" | The format in which the generated images are returned. Must be one of "png", "jpg", "jpeg", "webp". |
| `output_compression` | integer | 100 | The compression level (0-100%) for the generated images. |
| `background` | string or null | "auto" | Allows to set transparency for the background of the generated image(s).

#### vllm-omni Extension Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `url` | string or array | None | The image(s) to edit. |
| `negative_prompt` | string | null | Text describing what to avoid in the image |
| `num_inference_steps` | integer | model defaults | Number of diffusion steps |
| `guidance_scale` | float | model defaults | Classifier-free guidance scale (typically 0.0-20.0) |
| `true_cfg_scale` | float | model defaults | True CFG scale (model-specific parameter, may be ignored if not supported) |
| `seed` | integer | null | Random seed for reproducibility |

### Response Format

```json
{
  "created": 1701234567,
  "data": [
    {
      "b64_json": "<base64-encoded PNG>",
      "url": null,
      "revised_prompt": null
    }
  ],
  "output_format": null,
  "size": null,
}
```

## Examples

### Multiple Images input

```bash
curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
  -X POST "http://localhost:8000/v1/images/edits" \
  -F "model=xxx" \
  -F "image=@xx.png" \
  -F "image=@xx.png"
  -F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
  -F "size=1024x1024" \
  -F "output_format=png"
```


## Parameter Handling

The API passes parameters directly to the diffusion pipeline without model-specific transformation:

- **Default values**: When parameters are not specified, the underlying model uses its own defaults
- **Pass-through design**: User-provided values are forwarded directly to the diffusion engine
- **Minimal validation**: Only basic type checking and range validation at the API level

### Parameter Compatibility

The API passes parameters directly to the diffusion pipeline without model-specific validation.

- Unsupported parameters may be silently ignored by the model
- Incompatible values will result in errors from the underlying pipeline
- Recommended values vary by model - consult model documentation

**Best Practice:** Start with the model's recommended parameters, then adjust based on your needs.

## Error Responses

### 400 Bad Request

Invalid parameters (e.g., model mismatch):

```json
{
  "detail": "Invalid size format: '1024x'. Expected format: 'WIDTHxHEIGHT' (e.g., '1024x1024')."
}
```

### 422 Unprocessable Entity

Validation errors (missing required fields):

```json
{
  "detail": "Field 'image' or 'url' is required"
}
```

## Troubleshooting

### Server Not Running

```bash
# Check if server is responding
curl -X http://localhost:8000/v1/images/edit \
  -F "prompt='test'"
```

### Out of Memory

If you encounter OOM errors:
1. Reduce image size: `"size": "512x512"`
2. Reduce inference steps: `"num_inference_steps": 25`

## Development

Enable debug logging to see prompts and generation details:

```bash
vllm serve Qwen/Qwen-Image-Edit-2511 --omni \
  --uvicorn-log-level debug
```