deploy_gradio.md 12.3 KB
Newer Older
gushiqiao's avatar
gushiqiao committed
1
# Gradio Deployment Guide
helloyongyang's avatar
helloyongyang committed
2

gushiqiao's avatar
gushiqiao committed
3
4
5
6
## 📖 Overview

Lightx2v is a lightweight video inference and generation engine that provides a web interface based on Gradio, supporting both Image-to-Video and Text-to-Video generation modes.

gushiqiao's avatar
gushiqiao committed
7
8
9
10
11
12
13
14
15
16
17
18
## 📁 File Structure

```
LightX2V/app/
├── gradio_demo.py          # English interface demo
├── gradio_demo_zh.py       # Chinese interface demo
├── run_gradio.sh          # Startup script
├── README.md              # Documentation
├── saved_videos/          # Generated video save directory
└── inference_logs.log     # Inference logs
```

gushiqiao's avatar
gushiqiao committed
19
20
21
22
23
24
This project contains two main demo files:
- `gradio_demo.py` - English interface version
- `gradio_demo_zh.py` - Chinese interface version

## 🚀 Quick Start

gushiqiao's avatar
gushiqiao committed
25
### Environment Requirements
gushiqiao's avatar
gushiqiao committed
26

gushiqiao's avatar
gushiqiao committed
27
Follow the [Quick Start Guide](../getting_started/quickstart.md) to install the environment
gushiqiao's avatar
gushiqiao committed
28
29
30
31
32
33
34
35
36

#### Recommended Optimization Library Configuration

-[Flash attention](https://github.com/Dao-AILab/flash-attention)
-[Sage attention](https://github.com/thu-ml/SageAttention)
-[vllm-kernel](https://github.com/vllm-project/vllm)
-[sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
-[q8-kernel](https://github.com/KONAKONA666/q8_kernels) (only supports ADA architecture GPUs)

gushiqiao's avatar
gushiqiao committed
37
38
Install according to the project homepage tutorials for each operator as needed

gushiqiao's avatar
gushiqiao committed
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
### 🤖 Supported Models

#### 🎬 Image-to-Video Models

| Model Name | Resolution | Parameters | Features | Recommended Use |
|------------|------------|------------|----------|-----------------|
| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) | 480p | 14B | Standard version | Balance speed and quality |
| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) | 720p | 14B | HD version | Pursue high-quality output |
| ✅ [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) | 480p | 14B | Distilled optimized version | Faster inference speed |
| ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v) | 720p | 14B | HD distilled version | High quality + fast inference |

#### 📝 Text-to-Video Models

| Model Name | Parameters | Features | Recommended Use |
|------------|------------|----------|-----------------|
| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v) | 1.3B | Lightweight | Fast prototyping and testing |
| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v) | 14B | Standard version | Balance speed and quality |
| ✅ [Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v) | 14B | Distilled optimized version | High quality + fast inference |

**💡 Model Selection Recommendations**:
gushiqiao's avatar
gushiqiao committed
59
- **First-time use**: Recommend choosing distilled versions (`wan2.1_distill`)
gushiqiao's avatar
gushiqiao committed
60
- **Pursuing quality**: Choose 720p resolution or 14B parameter models
gushiqiao's avatar
gushiqiao committed
61
- **Pursuing speed**: Choose 480p resolution or 1.3B parameter models, prioritize distilled versions
gushiqiao's avatar
gushiqiao committed
62
- **Resource-constrained**: Prioritize distilled versions and lower resolutions
gushiqiao's avatar
gushiqiao committed
63
64
65
66
67
- **Real-time applications**: Strongly recommend using distilled models (`wan2.1_distill`)

**🎯 Model Category Description**:
- **`wan2.1`**: Standard model, provides the best video generation quality, suitable for scenarios with extremely high quality requirements
- **`wan2.1_distill`**: Distilled model, optimized through knowledge distillation technology, significantly improves inference speed, maintains good quality while greatly reducing computation time, suitable for most application scenarios
gushiqiao's avatar
gushiqiao committed
68
69
70
71
72

### Startup Methods

#### Method 1: Using Startup Script (Recommended)

gushiqiao's avatar
gushiqiao committed
73
**Linux Environment:**
gushiqiao's avatar
gushiqiao committed
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
```bash
# 1. Edit the startup script to configure relevant paths
cd app/
vim run_gradio.sh

# Configuration items that need to be modified:
# - lightx2v_path: Lightx2v project root directory path
# - i2v_model_path: Image-to-video model path
# - t2v_model_path: Text-to-video model path

# 💾 Important note: Recommend pointing model paths to SSD storage locations
# Example: /mnt/ssd/models/ or /data/ssd/models/

# 2. Run the startup script
bash run_gradio.sh

gushiqiao's avatar
gushiqiao committed
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# 3. Or start with parameters (recommended using distilled models)
bash run_gradio.sh --task i2v --lang en --model_cls wan2.1 --model_size 14b --port 8032
bash run_gradio.sh --task t2v --lang en --model_cls wan2.1 --model_size 1.3b --port 8032
bash run_gradio.sh --task i2v --lang en --model_cls wan2.1_distill --model_size 14b --port 8032
bash run_gradio.sh --task t2v --lang en --model_cls wan2.1_distill --model_size 1.3b --port 8032
```

**Windows Environment:**
```cmd
# 1. Edit the startup script to configure relevant paths
cd app\
notepad run_gradio_win.bat

# Configuration items that need to be modified:
# - lightx2v_path: Lightx2v project root directory path
# - i2v_model_path: Image-to-video model path
# - t2v_model_path: Text-to-video model path

# 💾 Important note: Recommend pointing model paths to SSD storage locations
# Example: D:\models\ or E:\models\

# 2. Run the startup script
run_gradio_win.bat

# 3. Or start with parameters (recommended using distilled models)
run_gradio_win.bat --task i2v --lang en --model_cls wan2.1 --model_size 14b --port 8032
run_gradio_win.bat --task t2v --lang en --model_cls wan2.1 --model_size 1.3b --port 8032
run_gradio_win.bat --task i2v --lang en --model_cls wan2.1_distill --model_size 14b --port 8032
run_gradio_win.bat --task t2v --lang en --model_cls wan2.1_distill --model_size 1.3b --port 8032
gushiqiao's avatar
gushiqiao committed
119
120
121
122
```

#### Method 2: Direct Command Line Startup

gushiqiao's avatar
gushiqiao committed
123
124
**Linux Environment:**

gushiqiao's avatar
gushiqiao committed
125
126
127
**Image-to-Video Mode:**
```bash
python gradio_demo.py \
gushiqiao's avatar
gushiqiao committed
128
129
    --model_path /path/to/Wan2.1-I2V-14B-480P-Lightx2v \
    --model_cls wan2.1 \
gushiqiao's avatar
gushiqiao committed
130
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
131
132
133
134
135
    --task i2v \
    --server_name 0.0.0.0 \
    --server_port 7862
```

gushiqiao's avatar
gushiqiao committed
136
**English Interface Version:**
gushiqiao's avatar
gushiqiao committed
137
138
```bash
python gradio_demo.py \
gushiqiao's avatar
gushiqiao committed
139
140
141
    --model_path /path/to/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v \
    --model_cls wan2.1_distill \
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
142
143
144
145
146
    --task t2v \
    --server_name 0.0.0.0 \
    --server_port 7862
```

gushiqiao's avatar
gushiqiao committed
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
**Windows Environment:**

**Image-to-Video Mode:**
```cmd
python gradio_demo.py ^
    --model_path D:\models\Wan2.1-I2V-14B-480P-Lightx2v ^
    --model_cls wan2.1 ^
    --model_size 14b ^
    --task i2v ^
    --server_name 127.0.0.1 ^
    --server_port 7862
```

**English Interface Version:**
```cmd
python gradio_demo.py ^
    --model_path D:\models\Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v ^
    --model_cls wan2.1_distill ^
    --model_size 14b ^
    --task t2v ^
    --server_name 127.0.0.1 ^
gushiqiao's avatar
gushiqiao committed
168
169
170
171
172
173
174
175
    --server_port 7862
```

## 📋 Command Line Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `--model_path` | str | ✅ | - | Model folder path |
gushiqiao's avatar
gushiqiao committed
176
177
| `--model_cls` | str | ❌ | wan2.1 | Model class: `wan2.1` (standard model) or `wan2.1_distill` (distilled model, faster inference) |
| `--model_size` | str | ✅ | - | Model size: `14b (image-to-video or text-to-video)` or `1.3b (text-to-video)` |
gushiqiao's avatar
gushiqiao committed
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
| `--task` | str | ✅ | - | Task type: `i2v` (image-to-video) or `t2v` (text-to-video) |
| `--server_port` | int | ❌ | 7862 | Server port |
| `--server_name` | str | ❌ | 0.0.0.0 | Server IP address |

## 🎯 Features

### Basic Settings

#### Input Parameters
- **Prompt**: Describe the expected video content
- **Negative Prompt**: Specify elements you don't want to appear
- **Resolution**: Supports multiple preset resolutions (480p/540p/720p)
- **Random Seed**: Controls the randomness of generation results
- **Inference Steps**: Affects the balance between generation quality and speed

#### Video Parameters
- **FPS**: Frames per second
- **Total Frames**: Video length
- **CFG Scale Factor**: Controls prompt influence strength (1-10)
- **Distribution Shift**: Controls generation style deviation degree (0-10)

### Advanced Optimization Options

#### GPU Memory Optimization
- **Chunked Rotary Position Embedding**: Saves GPU memory
- **Rotary Embedding Chunk Size**: Controls chunk granularity
- **Clean CUDA Cache**: Promptly frees GPU memory

#### Asynchronous Offloading
- **CPU Offloading**: Transfers partial computation to CPU
- **Lazy Loading**: Loads model components on-demand, significantly reduces system memory consumption
- **Offload Granularity Control**: Fine-grained control of offloading strategies

#### Low-Precision Quantization
- **Attention Operators**: Flash Attention, Sage Attention, etc.
- **Quantization Operators**: vLLM, SGL, Q8F, etc.
- **Precision Modes**: FP8, INT8, BF16, etc.

#### VAE Optimization
- **Lightweight VAE**: Accelerates decoding process
- **VAE Tiling Inference**: Reduces memory usage

#### Feature Caching
- **Tea Cache**: Caches intermediate features to accelerate generation
- **Cache Threshold**: Controls cache trigger conditions
- **Key Step Caching**: Writes cache only at key steps

## 🔧 Auto-Configuration Feature

After enabling "Auto-configure Inference Options", the system will automatically optimize parameters based on your hardware configuration:

### GPU Memory Rules
- **80GB+**: Default configuration, no optimization needed
- **48GB**: Enable CPU offloading, offload ratio 50%
- **40GB**: Enable CPU offloading, offload ratio 80%
- **32GB**: Enable CPU offloading, offload ratio 100%
- **24GB**: Enable BF16 precision, VAE tiling
- **16GB**: Enable chunked offloading, rotary embedding chunking
- **12GB**: Enable cache cleaning, lightweight VAE
- **8GB**: Enable quantization, lazy loading

### CPU Memory Rules
- **128GB+**: Default configuration
- **64GB**: Enable DIT quantization
- **32GB**: Enable lazy loading
- **16GB**: Enable full model quantization

## ⚠️ Important Notes

### 🚀 Low-Resource Device Optimization Recommendations

**💡 For devices with insufficient VRAM or performance constraints**:

gushiqiao's avatar
gushiqiao committed
251
- **🎯 Model Selection**: Prioritize using distilled version models (`wan2.1_distill`)
gushiqiao's avatar
gushiqiao committed
252
253
254
- **⚡ Inference Steps**: Recommend setting to 4 steps
- **🔧 CFG Settings**: Recommend disabling CFG option to improve generation speed
- **🔄 Auto-Configuration**: Enable "Auto-configure Inference Options"
gushiqiao's avatar
gushiqiao committed
255
- **💾 Storage Optimization**: Ensure models are stored on SSD for optimal loading performance
gushiqiao's avatar
gushiqiao committed
256
257
258
259

## 🎨 Interface Description

### Basic Settings Tab
gushiqiao's avatar
gushiqiao committed
260
- **Input Parameters**: Prompts, resolution, and other basic settings
gushiqiao's avatar
gushiqiao committed
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
- **Video Parameters**: FPS, frame count, CFG, and other video generation parameters
- **Output Settings**: Video save path configuration

### Advanced Options Tab
- **GPU Memory Optimization**: Memory management related options
- **Asynchronous Offloading**: CPU offloading and lazy loading
- **Low-Precision Quantization**: Various quantization optimization options
- **VAE Optimization**: Variational Autoencoder optimization
- **Feature Caching**: Cache strategy configuration

## 🔍 Troubleshooting

### Common Issues

**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:

1. **CUDA Memory Insufficient**
   - Enable CPU offloading
   - Reduce resolution
   - Enable quantization options

2. **System Memory Insufficient**
   - Enable CPU offloading
   - Enable lazy loading option
   - Enable quantization options

3. **Slow Generation Speed**
   - Reduce inference steps
   - Enable auto-configuration
   - Use lightweight models
   - Enable Tea Cache
   - Use quantization operators
   - 💾 **Check if models are stored on SSD**

4. **Slow Model Loading**
   - 💾 **Migrate models to SSD storage**
   - Enable lazy loading option
   - Check disk I/O performance
   - Consider using NVMe SSD

5. **Poor Video Quality**
   - Increase inference steps
   - Increase CFG scale factor
   - Use 14B models
   - Optimize prompts

### Log Viewing

```bash
# View inference logs
tail -f inference_logs.log

# View GPU usage
nvidia-smi

# View system resources
htop
```

gushiqiao's avatar
gushiqiao committed
320
Welcome to submit Issues and Pull Requests to improve this project!
gushiqiao's avatar
gushiqiao committed
321
322

**Note**: Please comply with relevant laws and regulations when using videos generated by this tool, and do not use them for illegal purposes.