deploy_gradio.md 14.2 KB
Newer Older
gushiqiao's avatar
gushiqiao committed
1
# Gradio Deployment Guide
helloyongyang's avatar
helloyongyang committed
2

gushiqiao's avatar
gushiqiao committed
3
4
5
6
## 📖 Overview

Lightx2v is a lightweight video inference and generation engine that provides a web interface based on Gradio, supporting both Image-to-Video and Text-to-Video generation modes.

gushiqiao's avatar
gushiqiao committed
7
8
For Windows systems, we provide a convenient one-click deployment solution with automatic environment configuration and intelligent parameter optimization. Please refer to the [One-Click Gradio Startup (Recommended)](./deploy_local_windows.md/#one-click-gradio-startup-recommended) section for detailed instructions.

gushiqiao's avatar
Fix  
gushiqiao committed
9
![Gradio English Interface](../../../../assets/figs/portabl_windows/pic_gradio_en.png)
gushiqiao's avatar
gushiqiao committed
10

gushiqiao's avatar
gushiqiao committed
11
12
13
14
15
16
17
18
## 📁 File Structure

```
LightX2V/app/
├── gradio_demo.py          # English interface demo
├── gradio_demo_zh.py       # Chinese interface demo
├── run_gradio.sh          # Startup script
├── README.md              # Documentation
gushiqiao's avatar
gushiqiao committed
19
├── outputs/               # Generated video save directory
gushiqiao's avatar
gushiqiao committed
20
21
22
└── inference_logs.log     # Inference logs
```

gushiqiao's avatar
gushiqiao committed
23
24
25
26
27
28
This project contains two main demo files:
- `gradio_demo.py` - English interface version
- `gradio_demo_zh.py` - Chinese interface version

## 🚀 Quick Start

gushiqiao's avatar
gushiqiao committed
29
### Environment Requirements
gushiqiao's avatar
gushiqiao committed
30

gushiqiao's avatar
gushiqiao committed
31
Follow the [Quick Start Guide](../getting_started/quickstart.md) to install the environment
gushiqiao's avatar
gushiqiao committed
32
33
34
35
36
37
38
39
40

#### Recommended Optimization Library Configuration

-[Flash attention](https://github.com/Dao-AILab/flash-attention)
-[Sage attention](https://github.com/thu-ml/SageAttention)
-[vllm-kernel](https://github.com/vllm-project/vllm)
-[sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
-[q8-kernel](https://github.com/KONAKONA666/q8_kernels) (only supports ADA architecture GPUs)

gushiqiao's avatar
gushiqiao committed
41
42
Install according to the project homepage tutorials for each operator as needed

gushiqiao's avatar
gushiqiao committed
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
### 🤖 Supported Models

#### 🎬 Image-to-Video Models

| Model Name | Resolution | Parameters | Features | Recommended Use |
|------------|------------|------------|----------|-----------------|
| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) | 480p | 14B | Standard version | Balance speed and quality |
| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) | 720p | 14B | HD version | Pursue high-quality output |
| ✅ [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) | 480p | 14B | Distilled optimized version | Faster inference speed |
| ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v) | 720p | 14B | HD distilled version | High quality + fast inference |

#### 📝 Text-to-Video Models

| Model Name | Parameters | Features | Recommended Use |
|------------|------------|----------|-----------------|
| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v) | 1.3B | Lightweight | Fast prototyping and testing |
| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v) | 14B | Standard version | Balance speed and quality |
| ✅ [Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v) | 14B | Distilled optimized version | High quality + fast inference |

**💡 Model Selection Recommendations**:
gushiqiao's avatar
gushiqiao committed
63
- **First-time use**: Recommend choosing distilled versions (`wan2.1_distill`)
gushiqiao's avatar
gushiqiao committed
64
- **Pursuing quality**: Choose 720p resolution or 14B parameter models
gushiqiao's avatar
gushiqiao committed
65
- **Pursuing speed**: Choose 480p resolution or 1.3B parameter models, prioritize distilled versions
gushiqiao's avatar
gushiqiao committed
66
- **Resource-constrained**: Prioritize distilled versions and lower resolutions
gushiqiao's avatar
gushiqiao committed
67
68
69
70
71
- **Real-time applications**: Strongly recommend using distilled models (`wan2.1_distill`)

**🎯 Model Category Description**:
- **`wan2.1`**: Standard model, provides the best video generation quality, suitable for scenarios with extremely high quality requirements
- **`wan2.1_distill`**: Distilled model, optimized through knowledge distillation technology, significantly improves inference speed, maintains good quality while greatly reducing computation time, suitable for most application scenarios
gushiqiao's avatar
gushiqiao committed
72

gushiqiao's avatar
gushiqiao committed
73
74
75
76
77
78
79
80
81
82
83
84
85
86
**📥 Model Download**:

Refer to the [Model Structure Documentation](./model_structure.md) to download complete models (including quantized and non-quantized versions) or download only quantized/non-quantized versions.

**Download Options**:

- **Complete Model**: When downloading complete models with both quantized and non-quantized versions, you can freely choose the quantization precision for DIT/T5/CLIP in the advanced options of the `Gradio` Web frontend.

- **Non-quantized Version Only**: When downloading only non-quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to bf16/fp16. If you need to use quantized versions of models, please manually download quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.

- **Quantized Version Only**: When downloading only quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to fp8 or int8 (depending on the weights you downloaded). If you need to use non-quantized versions of models, please manually download non-quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.

- **Note**: Whether you download complete models or partial models, the values for `i2v_model_path` and `t2v_model_path` parameters should be the first-level directory paths. For example: `Wan2.1-I2V-14B-480P-Lightx2v/`, not `Wan2.1-I2V-14B-480P-Lightx2v/int8`.

gushiqiao's avatar
gushiqiao committed
87
88
89
90
### Startup Methods

#### Method 1: Using Startup Script (Recommended)

gushiqiao's avatar
gushiqiao committed
91
**Linux Environment:**
gushiqiao's avatar
gushiqiao committed
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
```bash
# 1. Edit the startup script to configure relevant paths
cd app/
vim run_gradio.sh

# Configuration items that need to be modified:
# - lightx2v_path: Lightx2v project root directory path
# - i2v_model_path: Image-to-video model path
# - t2v_model_path: Text-to-video model path

# 💾 Important note: Recommend pointing model paths to SSD storage locations
# Example: /mnt/ssd/models/ or /data/ssd/models/

# 2. Run the startup script
bash run_gradio.sh

gushiqiao's avatar
gushiqiao committed
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# 3. Or start with parameters (recommended using distilled models)
bash run_gradio.sh --task i2v --lang en --model_cls wan2.1 --model_size 14b --port 8032
bash run_gradio.sh --task t2v --lang en --model_cls wan2.1 --model_size 1.3b --port 8032
bash run_gradio.sh --task i2v --lang en --model_cls wan2.1_distill --model_size 14b --port 8032
bash run_gradio.sh --task t2v --lang en --model_cls wan2.1_distill --model_size 1.3b --port 8032
```

**Windows Environment:**
```cmd
# 1. Edit the startup script to configure relevant paths
cd app\
notepad run_gradio_win.bat

# Configuration items that need to be modified:
# - lightx2v_path: Lightx2v project root directory path
# - i2v_model_path: Image-to-video model path
# - t2v_model_path: Text-to-video model path

# 💾 Important note: Recommend pointing model paths to SSD storage locations
# Example: D:\models\ or E:\models\

# 2. Run the startup script
run_gradio_win.bat

# 3. Or start with parameters (recommended using distilled models)
run_gradio_win.bat --task i2v --lang en --model_cls wan2.1 --model_size 14b --port 8032
run_gradio_win.bat --task t2v --lang en --model_cls wan2.1 --model_size 1.3b --port 8032
run_gradio_win.bat --task i2v --lang en --model_cls wan2.1_distill --model_size 14b --port 8032
run_gradio_win.bat --task t2v --lang en --model_cls wan2.1_distill --model_size 1.3b --port 8032
gushiqiao's avatar
gushiqiao committed
137
138
139
140
```

#### Method 2: Direct Command Line Startup

gushiqiao's avatar
gushiqiao committed
141
142
**Linux Environment:**

gushiqiao's avatar
gushiqiao committed
143
144
145
**Image-to-Video Mode:**
```bash
python gradio_demo.py \
gushiqiao's avatar
gushiqiao committed
146
147
    --model_path /path/to/Wan2.1-I2V-14B-480P-Lightx2v \
    --model_cls wan2.1 \
gushiqiao's avatar
gushiqiao committed
148
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
149
150
151
152
153
    --task i2v \
    --server_name 0.0.0.0 \
    --server_port 7862
```

gushiqiao's avatar
gushiqiao committed
154
**English Interface Version:**
gushiqiao's avatar
gushiqiao committed
155
156
```bash
python gradio_demo.py \
gushiqiao's avatar
gushiqiao committed
157
158
159
    --model_path /path/to/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v \
    --model_cls wan2.1_distill \
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
160
161
162
163
164
    --task t2v \
    --server_name 0.0.0.0 \
    --server_port 7862
```

gushiqiao's avatar
gushiqiao committed
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
**Windows Environment:**

**Image-to-Video Mode:**
```cmd
python gradio_demo.py ^
    --model_path D:\models\Wan2.1-I2V-14B-480P-Lightx2v ^
    --model_cls wan2.1 ^
    --model_size 14b ^
    --task i2v ^
    --server_name 127.0.0.1 ^
    --server_port 7862
```

**English Interface Version:**
```cmd
python gradio_demo.py ^
    --model_path D:\models\Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v ^
    --model_cls wan2.1_distill ^
    --model_size 14b ^
    --task t2v ^
    --server_name 127.0.0.1 ^
gushiqiao's avatar
gushiqiao committed
186
187
188
189
190
191
192
193
    --server_port 7862
```

## 📋 Command Line Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `--model_path` | str | ✅ | - | Model folder path |
gushiqiao's avatar
gushiqiao committed
194
195
| `--model_cls` | str | ❌ | wan2.1 | Model class: `wan2.1` (standard model) or `wan2.1_distill` (distilled model, faster inference) |
| `--model_size` | str | ✅ | - | Model size: `14b (image-to-video or text-to-video)` or `1.3b (text-to-video)` |
gushiqiao's avatar
gushiqiao committed
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
| `--task` | str | ✅ | - | Task type: `i2v` (image-to-video) or `t2v` (text-to-video) |
| `--server_port` | int | ❌ | 7862 | Server port |
| `--server_name` | str | ❌ | 0.0.0.0 | Server IP address |

## 🎯 Features

### Basic Settings

#### Input Parameters
- **Prompt**: Describe the expected video content
- **Negative Prompt**: Specify elements you don't want to appear
- **Resolution**: Supports multiple preset resolutions (480p/540p/720p)
- **Random Seed**: Controls the randomness of generation results
- **Inference Steps**: Affects the balance between generation quality and speed

#### Video Parameters
- **FPS**: Frames per second
- **Total Frames**: Video length
- **CFG Scale Factor**: Controls prompt influence strength (1-10)
- **Distribution Shift**: Controls generation style deviation degree (0-10)

### Advanced Optimization Options

#### GPU Memory Optimization
- **Chunked Rotary Position Embedding**: Saves GPU memory
- **Rotary Embedding Chunk Size**: Controls chunk granularity
- **Clean CUDA Cache**: Promptly frees GPU memory

#### Asynchronous Offloading
- **CPU Offloading**: Transfers partial computation to CPU
- **Lazy Loading**: Loads model components on-demand, significantly reduces system memory consumption
- **Offload Granularity Control**: Fine-grained control of offloading strategies

#### Low-Precision Quantization
- **Attention Operators**: Flash Attention, Sage Attention, etc.
- **Quantization Operators**: vLLM, SGL, Q8F, etc.
- **Precision Modes**: FP8, INT8, BF16, etc.

#### VAE Optimization
- **Lightweight VAE**: Accelerates decoding process
- **VAE Tiling Inference**: Reduces memory usage

#### Feature Caching
- **Tea Cache**: Caches intermediate features to accelerate generation
- **Cache Threshold**: Controls cache trigger conditions
- **Key Step Caching**: Writes cache only at key steps

## 🔧 Auto-Configuration Feature

After enabling "Auto-configure Inference Options", the system will automatically optimize parameters based on your hardware configuration:

### GPU Memory Rules
- **80GB+**: Default configuration, no optimization needed
- **48GB**: Enable CPU offloading, offload ratio 50%
- **40GB**: Enable CPU offloading, offload ratio 80%
- **32GB**: Enable CPU offloading, offload ratio 100%
- **24GB**: Enable BF16 precision, VAE tiling
- **16GB**: Enable chunked offloading, rotary embedding chunking
- **12GB**: Enable cache cleaning, lightweight VAE
- **8GB**: Enable quantization, lazy loading

### CPU Memory Rules
- **128GB+**: Default configuration
- **64GB**: Enable DIT quantization
- **32GB**: Enable lazy loading
- **16GB**: Enable full model quantization

## ⚠️ Important Notes

### 🚀 Low-Resource Device Optimization Recommendations

**💡 For devices with insufficient VRAM or performance constraints**:

gushiqiao's avatar
gushiqiao committed
269
- **🎯 Model Selection**: Prioritize using distilled version models (`wan2.1_distill`)
gushiqiao's avatar
gushiqiao committed
270
271
272
- **⚡ Inference Steps**: Recommend setting to 4 steps
- **🔧 CFG Settings**: Recommend disabling CFG option to improve generation speed
- **🔄 Auto-Configuration**: Enable "Auto-configure Inference Options"
gushiqiao's avatar
gushiqiao committed
273
- **💾 Storage Optimization**: Ensure models are stored on SSD for optimal loading performance
gushiqiao's avatar
gushiqiao committed
274
275
276
277

## 🎨 Interface Description

### Basic Settings Tab
gushiqiao's avatar
gushiqiao committed
278
- **Input Parameters**: Prompts, resolution, and other basic settings
gushiqiao's avatar
gushiqiao committed
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
- **Video Parameters**: FPS, frame count, CFG, and other video generation parameters
- **Output Settings**: Video save path configuration

### Advanced Options Tab
- **GPU Memory Optimization**: Memory management related options
- **Asynchronous Offloading**: CPU offloading and lazy loading
- **Low-Precision Quantization**: Various quantization optimization options
- **VAE Optimization**: Variational Autoencoder optimization
- **Feature Caching**: Cache strategy configuration

## 🔍 Troubleshooting

### Common Issues

**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:

gushiqiao's avatar
gushiqiao committed
295
296
297
298
1. **Gradio Webpage Opens Blank**
   - Try upgrading gradio: `pip install --upgrade gradio`

2. **CUDA Memory Insufficient**
gushiqiao's avatar
gushiqiao committed
299
300
301
302
   - Enable CPU offloading
   - Reduce resolution
   - Enable quantization options

gushiqiao's avatar
gushiqiao committed
303
3. **System Memory Insufficient**
gushiqiao's avatar
gushiqiao committed
304
305
306
307
   - Enable CPU offloading
   - Enable lazy loading option
   - Enable quantization options

gushiqiao's avatar
gushiqiao committed
308
4. **Slow Generation Speed**
gushiqiao's avatar
gushiqiao committed
309
310
311
312
313
314
315
   - Reduce inference steps
   - Enable auto-configuration
   - Use lightweight models
   - Enable Tea Cache
   - Use quantization operators
   - 💾 **Check if models are stored on SSD**

gushiqiao's avatar
gushiqiao committed
316
5. **Slow Model Loading**
gushiqiao's avatar
gushiqiao committed
317
318
319
320
321
   - 💾 **Migrate models to SSD storage**
   - Enable lazy loading option
   - Check disk I/O performance
   - Consider using NVMe SSD

gushiqiao's avatar
gushiqiao committed
322
6. **Poor Video Quality**
gushiqiao's avatar
gushiqiao committed
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
   - Increase inference steps
   - Increase CFG scale factor
   - Use 14B models
   - Optimize prompts

### Log Viewing

```bash
# View inference logs
tail -f inference_logs.log

# View GPU usage
nvidia-smi

# View system resources
htop
```

gushiqiao's avatar
gushiqiao committed
341
Welcome to submit Issues and Pull Requests to improve this project!
gushiqiao's avatar
gushiqiao committed
342
343

**Note**: Please comply with relevant laws and regulations when using videos generated by this tool, and do not use them for illegal purposes.