image_edit_api.md 5.6 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# Image Edit API

vLLM-Omni provides an OpenAI DALL-E compatible API for image edit using diffusion models.

Each server instance runs a single model (specified at startup via `vllm serve <model> --omni`).

## Quick Start

### Start the Server

For example...

```bash
# Qwen-Image
vllm serve Qwen/Qwen-Image-Edit-2511 --omni --port 8000


### Generate Images

**Using curl:**

```bash
curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
  -X POST "http://localhost:8000/v1/images/edits" \
  -F "model=xxx" \
  -F "image=@./xx.png" \
  -F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
  -F "size=1024x1024" \
  -F "output_format=png"
```


**Using OpenAI SDK:**

```python
import base64
from openai import OpenAI
from pathlib import Path
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8000/v1"
)

input_image_url = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"

result = client.images.edit(
    image=[],
    model="Qwen-Image-Edit-2511",
    prompt="Change the bears in the two input images into walking together.",
    size='512x512',
    stream=False,
    output_format='jpeg',
    # url格式
    extra_body={
        "url": [input_image_url1,input_image_url],
        "num_inference_steps": 50,
        "guidance_scale": 1,
        "seed": 777,
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("edit_out_http.jpeg", "wb") as f:
    f.write(image_bytes)
```

## API Reference

### Endpoint

```
POST /v1/images/edits
Content-Type: multipart/form-data
```

### Request Parameters

#### OpenAI Standard Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | string | **required** | A text description of the desired image |
| `model` | string | server's model | Model to use (optional, should match server if specified) |
| `image` | string or array | **required** | The image(s) to edit. |
| `n` | integer | 1 | Number of images to generate (1-10) |
| `size` | string | "auto" | Image dimensions in WxH format (e.g., "1024x1024", "512x512"), when set to auto, it decide size from first input image. |
| `response_format` | string | "b64_json" | Response format (only "b64_json" supported) |
| `user` | string | null | User identifier for tracking |
| `output_format` | string | "png" | The format in which the generated images are returned. Must be one of "png", "jpg", "jpeg", "webp". |
| `output_compression` | integer | 100 | The compression level (0-100%) for the generated images. |
| `background` | string or null | "auto" | Allows to set transparency for the background of the generated image(s).

#### vllm-omni Extension Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `url` | string or array | None | The image(s) to edit. |
| `negative_prompt` | string | null | Text describing what to avoid in the image |
| `num_inference_steps` | integer | model defaults | Number of diffusion steps |
| `guidance_scale` | float | model defaults | Classifier-free guidance scale (typically 0.0-20.0) |
| `true_cfg_scale` | float | model defaults | True CFG scale (model-specific parameter, may be ignored if not supported) |
| `seed` | integer | null | Random seed for reproducibility |

### Response Format

```json
{
  "created": 1701234567,
  "data": [
    {
      "b64_json": "<base64-encoded PNG>",
      "url": null,
      "revised_prompt": null
    }
  ],
  "output_format": null,
  "size": null,
}
```

## Examples

### Multiple Images input

```bash
curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
  -X POST "http://localhost:8000/v1/images/edits" \
  -F "model=xxx" \
  -F "image=@xx.png" \
  -F "image=@xx.png"
  -F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
  -F "size=1024x1024" \
  -F "output_format=png"
```


## Parameter Handling

The API passes parameters directly to the diffusion pipeline without model-specific transformation:

- **Default values**: When parameters are not specified, the underlying model uses its own defaults
- **Pass-through design**: User-provided values are forwarded directly to the diffusion engine
- **Minimal validation**: Only basic type checking and range validation at the API level

### Parameter Compatibility

The API passes parameters directly to the diffusion pipeline without model-specific validation.

- Unsupported parameters may be silently ignored by the model
- Incompatible values will result in errors from the underlying pipeline
- Recommended values vary by model - consult model documentation

**Best Practice:** Start with the model's recommended parameters, then adjust based on your needs.

## Error Responses

### 400 Bad Request

Invalid parameters (e.g., model mismatch):

```json
{
  "detail": "Invalid size format: '1024x'. Expected format: 'WIDTHxHEIGHT' (e.g., '1024x1024')."
}
```

### 422 Unprocessable Entity

Validation errors (missing required fields):

```json
{
  "detail": "Field 'image' or 'url' is required"
}
```

## Troubleshooting

### Server Not Running

```bash
# Check if server is responding
curl -X http://localhost:8000/v1/images/edit \
  -F "prompt='test'"
```

### Out of Memory

If you encounter OOM errors:
1. Reduce image size: `"size": "512x512"`
2. Reduce inference steps: `"num_inference_steps": 25`

## Development

Enable debug logging to see prompts and generation details:

```bash
vllm serve Qwen/Qwen-Image-Edit-2511 --omni \
  --uvicorn-log-level debug
```