README.md 10.5 KB
Newer Older
gushiqiao's avatar
gushiqiao committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# Lightx2v Gradio Demo Interface

## 📖 Overview

Lightx2v is a lightweight video inference and generation engine that provides a web interface based on Gradio, supporting both Image-to-Video and Text-to-Video generation modes.

This project contains two main demo files:
- `gradio_demo.py` - English interface version
- `gradio_demo_zh.py` - Chinese interface version

## 🚀 Quick Start

### System Requirements

- Python 3.10+ (recommended)
- CUDA 12.4+ (recommended)
- At least 8GB GPU VRAM
- At least 16GB system memory
- At least 128GB SSD solid-state drive (**💾 Strongly recommend using SSD solid-state drives to store model files! During "lazy loading" startup, significantly improves model loading speed and inference performance**)

### Install Dependencies

```bash
# Install basic dependencies
pip install -r ../requirements.txt
pip install gradio
```

#### Recommended Optimization Library Configuration

-[Flash attention](https://github.com/Dao-AILab/flash-attention)
-[Sage attention](https://github.com/thu-ml/SageAttention)
-[vllm-kernel](https://github.com/vllm-project/vllm)
-[sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
-[q8-kernel](https://github.com/KONAKONA666/q8_kernels) (only supports ADA architecture GPUs)

### 🤖 Supported Models

#### 🎬 Image-to-Video Models

| Model Name | Resolution | Parameters | Features | Recommended Use |
|------------|------------|------------|----------|-----------------|
| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | Standard version | Balance speed and quality |
| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | HD version | Pursue high-quality output |
| ✅ [Wan2.1-I2V-14B-480P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | Distilled optimized version | Faster inference speed |
| ✅ [Wan2.1-I2V-14B-720P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | HD distilled version | High quality + fast inference |

#### 📝 Text-to-Video Models

| Model Name | Parameters | Features | Recommended Use |
|------------|------------|----------|-----------------|
| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 1.3B | Lightweight | Fast prototyping and testing |
| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | Standard version | Balance speed and quality |
| ✅ [Wan2.1-T2V-14B-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | Distilled optimized version | High quality + fast inference |

**💡 Model Selection Recommendations**:
- **First-time use**: Recommend choosing distilled versions
- **Pursuing quality**: Choose 720p resolution or 14B parameter models
- **Pursuing speed**: Choose 480p resolution or 1.3B parameter models
- **Resource-constrained**: Prioritize distilled versions and lower resolutions

### Startup Methods

#### Method 1: Using Startup Script (Recommended)

```bash
# 1. Edit the startup script to configure relevant paths
vim run_gradio.sh

# Configuration items that need to be modified:
# - lightx2v_path: Lightx2v project root directory path
# - i2v_model_path: Image-to-video model path
# - t2v_model_path: Text-to-video model path

# 💾 Important note: Recommend pointing model paths to SSD storage locations
# Example: /mnt/ssd/models/ or /data/ssd/models/

# 2. Run the startup script
bash run_gradio.sh

# 3. Or start with parameters (recommended)
bash run_gradio.sh --task i2v --lang en --port 8032
# bash run_gradio.sh --task t2v --lang en --port 8032
```

#### Method 2: Direct Command Line Startup

**Image-to-Video Mode:**
```bash
python gradio_demo.py \
    --model_path /path/to/Wan2.1-I2V-14B-720P-Lightx2v \
    --task i2v \
    --server_name 0.0.0.0 \
    --server_port 7862
```

**Text-to-Video Mode:**
```bash
python gradio_demo.py \
    --model_path /path/to/Wan2.1-T2V-1.3B \
    --task t2v \
    --server_name 0.0.0.0 \
    --server_port 7862
```

**Chinese Interface Version:**
```bash
python gradio_demo_zh.py \
    --model_path /path/to/model \
    --task i2v \
    --server_name 0.0.0.0 \
    --server_port 7862
```

## 📋 Command Line Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `--model_path` | str | ✅ | - | Model folder path |
| `--model_cls` | str | ❌ | wan2.1 | Model class (currently only supports wan2.1) |
| `--task` | str | ✅ | - | Task type: `i2v` (image-to-video) or `t2v` (text-to-video) |
| `--server_port` | int | ❌ | 7862 | Server port |
| `--server_name` | str | ❌ | 0.0.0.0 | Server IP address |

## 🎯 Features

### Basic Settings

#### Model Type Selection
- **Wan2.1 14B**: Large parameter count, high generation quality, suitable for high-quality video generation
- **Wan2.1 1.3B**: Lightweight model, fast speed, suitable for rapid prototyping and testing

#### Input Parameters
- **Prompt**: Describe the expected video content
- **Negative Prompt**: Specify elements you don't want to appear
- **Resolution**: Supports multiple preset resolutions (480p/540p/720p)
- **Random Seed**: Controls the randomness of generation results
- **Inference Steps**: Affects the balance between generation quality and speed

#### Video Parameters
- **FPS**: Frames per second
- **Total Frames**: Video length
- **CFG Scale Factor**: Controls prompt influence strength (1-10)
- **Distribution Shift**: Controls generation style deviation degree (0-10)

### Advanced Optimization Options

#### GPU Memory Optimization
- **Chunked Rotary Position Embedding**: Saves GPU memory
- **Rotary Embedding Chunk Size**: Controls chunk granularity
- **Clean CUDA Cache**: Promptly frees GPU memory

#### Asynchronous Offloading
- **CPU Offloading**: Transfers partial computation to CPU
- **Lazy Loading**: Loads model components on-demand, significantly reduces system memory consumption
- **Offload Granularity Control**: Fine-grained control of offloading strategies

#### Low-Precision Quantization
- **Attention Operators**: Flash Attention, Sage Attention, etc.
- **Quantization Operators**: vLLM, SGL, Q8F, etc.
- **Precision Modes**: FP8, INT8, BF16, etc.

#### VAE Optimization
- **Lightweight VAE**: Accelerates decoding process
- **VAE Tiling Inference**: Reduces memory usage

#### Feature Caching
- **Tea Cache**: Caches intermediate features to accelerate generation
- **Cache Threshold**: Controls cache trigger conditions
- **Key Step Caching**: Writes cache only at key steps

## 🔧 Auto-Configuration Feature

After enabling "Auto-configure Inference Options", the system will automatically optimize parameters based on your hardware configuration:

### GPU Memory Rules
- **80GB+**: Default configuration, no optimization needed
- **48GB**: Enable CPU offloading, offload ratio 50%
- **40GB**: Enable CPU offloading, offload ratio 80%
- **32GB**: Enable CPU offloading, offload ratio 100%
- **24GB**: Enable BF16 precision, VAE tiling
- **16GB**: Enable chunked offloading, rotary embedding chunking
- **12GB**: Enable cache cleaning, lightweight VAE
- **8GB**: Enable quantization, lazy loading

### CPU Memory Rules
- **128GB+**: Default configuration
- **64GB**: Enable DIT quantization
- **32GB**: Enable lazy loading
- **16GB**: Enable full model quantization

## ⚠️ Important Notes

### 🚀 Low-Resource Device Optimization Recommendations

**💡 For devices with insufficient VRAM or performance constraints**:

- **🎯 Model Selection**: Prioritize using distilled version models (StepDistill-CfgDistill)
- **⚡ Inference Steps**: Recommend setting to 4 steps
- **🔧 CFG Settings**: Recommend disabling CFG option to improve generation speed
- **🔄 Auto-Configuration**: Enable "Auto-configure Inference Options"

### 🔧 Quick Optimization Configuration Examples

```bash
# Start with distilled model
bash run_gradio.sh --task i2v

# Interface setting recommendations
- Inference Steps: 25
- CFG Scale Factor: 4
- Resolution: 832x480
- Auto-Configuration: Enabled
- Quantization Scheme: int8
- Tea Cache: Enabled
```

## 📁 File Structure

```
lightx2v/app/
├── gradio_demo.py          # English interface demo
├── gradio_demo_zh.py       # Chinese interface demo
├── run_gradio.sh          # Startup script
├── README.md              # Documentation
├── saved_videos/          # Generated video save directory
└── inference_logs.log     # Inference logs
```

## 🎨 Interface Description

### Basic Settings Tab
- **Input Parameters**: Model type, prompts, resolution, and other basic settings
- **Video Parameters**: FPS, frame count, CFG, and other video generation parameters
- **Output Settings**: Video save path configuration

### Advanced Options Tab
- **GPU Memory Optimization**: Memory management related options
- **Asynchronous Offloading**: CPU offloading and lazy loading
- **Low-Precision Quantization**: Various quantization optimization options
- **VAE Optimization**: Variational Autoencoder optimization
- **Feature Caching**: Cache strategy configuration

## 🔍 Troubleshooting

### Common Issues

**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:

1. **CUDA Memory Insufficient**
   - Enable CPU offloading
   - Reduce resolution
   - Enable quantization options

2. **System Memory Insufficient**
   - Enable CPU offloading
   - Enable lazy loading option
   - Enable quantization options

3. **Slow Generation Speed**
   - Reduce inference steps
   - Enable auto-configuration
   - Use lightweight models
   - Enable Tea Cache
   - Use quantization operators
   - 💾 **Check if models are stored on SSD**

4. **Slow Model Loading**
   - 💾 **Migrate models to SSD storage**
   - Enable lazy loading option
   - Check disk I/O performance
   - Consider using NVMe SSD

5. **Poor Video Quality**
   - Increase inference steps
   - Increase CFG scale factor
   - Use 14B models
   - Optimize prompts

### Log Viewing

```bash
# View inference logs
tail -f inference_logs.log

# View GPU usage
nvidia-smi

# View system resources
htop
```


**Note**: Please comply with relevant laws and regulations when using videos generated by this tool, and do not use them for illegal purposes.