model_structure.md 16.8 KB
Newer Older
gushiqiao's avatar
gushiqiao committed
1
# Model Structure Guide
gushiqiao's avatar
gushiqiao committed
2
3
4

## 📖 Overview

gushiqiao's avatar
gushiqiao committed
5
This document provides a comprehensive introduction to the model directory structure of the LightX2V project, designed to help users efficiently organize model files and achieve a convenient user experience. Through scientific directory organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters. Meanwhile, the system also supports flexible manual path configuration to meet the diverse needs of different user groups.
gushiqiao's avatar
gushiqiao committed
6
7
8

## 🗂️ Model Directory Structure

gushiqiao's avatar
gushiqiao committed
9
### LightX2V Official Model List
gushiqiao's avatar
gushiqiao committed
10

gushiqiao's avatar
gushiqiao committed
11
View all available models: [LightX2V Official Model Repository](https://huggingface.co/lightx2v)
gushiqiao's avatar
gushiqiao committed
12
13
14

### Standard Directory Structure

gushiqiao's avatar
gushiqiao committed
15
Using `Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V` as an example, the standard file structure is as follows:
gushiqiao's avatar
gushiqiao committed
16
17

```
gushiqiao's avatar
gushiqiao committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/
├── fp8/                                          # FP8 quantized version (DIT/T5/CLIP)
│   ├── block_xx.safetensors                      # DIT model FP8 quantized version
│   ├── models_t5_umt5-xxl-enc-fp8.pth            # T5 encoder FP8 quantized version
│   ├── clip-fp8.pth                              # CLIP encoder FP8 quantized version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
├── int8/                                         # INT8 quantized version (DIT/T5/CLIP)
│   ├── block_xx.safetensors                      # DIT model INT8 quantized version
│   ├── models_t5_umt5-xxl-enc-int8.pth           # T5 encoder INT8 quantized version
│   ├── clip-int8.pth                             # CLIP encoder INT8 quantized version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
├── original/                                     # Original precision version (DIT/T5/CLIP)
│   ├── distill_model.safetensors                 # DIT model original precision version
│   ├── models_t5_umt5-xxl-enc-bf16.pth           # T5 encoder original precision version
│   ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth  # CLIP encoder original precision version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
gushiqiao's avatar
gushiqiao committed
40
41
42
43
```

### 💾 Storage Recommendations

gushiqiao's avatar
gushiqiao committed
44
**It is strongly recommended to store model files on SSD solid-state drives**, as this can significantly improve model loading speed and inference performance.
gushiqiao's avatar
gushiqiao committed
45
46
47
48
49
50
51
52

**Recommended storage paths**:
```bash
/mnt/ssd/models/          # Independent SSD mount point
/data/ssd/models/         # Data SSD directory
/opt/models/              # System optimization directory
```

gushiqiao's avatar
gushiqiao committed
53
### Quantization Version Description
gushiqiao's avatar
gushiqiao committed
54

gushiqiao's avatar
gushiqiao committed
55
56
57
58
Each model includes multiple quantized versions to adapt to different hardware configuration requirements:
- **FP8 Version**: Suitable for GPUs that support FP8 (such as H100, A100, RTX 40 series), providing optimal performance
- **INT8 Version**: Suitable for most GPUs, balancing performance and compatibility, reducing memory usage by approximately 50%
- **Original Precision Version**: Suitable for applications with extremely high precision requirements, providing highest quality output
gushiqiao's avatar
gushiqiao committed
59

gushiqiao's avatar
gushiqiao committed
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
## 🚀 Usage Methods

### Environment Setup

#### Installing Hugging Face CLI

Before starting to download models, please ensure that Hugging Face CLI is properly installed:

```bash
# Install huggingface_hub
pip install huggingface_hub

# Or install huggingface-cli
pip install huggingface-cli

# Login to Hugging Face (optional, but strongly recommended)
huggingface-cli login
gushiqiao's avatar
gushiqiao committed
77
```
gushiqiao's avatar
gushiqiao committed
78
79
80
81
82
83
84
85
86
87
88

### Method 1: Complete Model Download (Recommended)

**Advantage**: After downloading the complete model, the system will automatically identify all component paths without manual configuration, providing a more convenient user experience

#### 1. Download Complete Model

```bash
# Use Hugging Face CLI to download complete model
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V
gushiqiao's avatar
gushiqiao committed
89
90
```

gushiqiao's avatar
gushiqiao committed
91
#### 2. Start Inference
gushiqiao's avatar
gushiqiao committed
92

gushiqiao's avatar
gushiqiao committed
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
##### Bash Script Startup

###### Scenario 1: Using Full Precision Model

Modify the configuration in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh):
- `model_path`: Set to the downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`
- `lightx2v_path`: Set to the LightX2V project root directory path

###### Scenario 2: Using Quantized Model

When using the complete model, if you need to enable quantization, please add the following configuration to the [configuration file](https://github.com/ModelTC/LightX2V/tree/main/configs/distill/wan_i2v_distill_4step_cfg.json):

```json
{
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
    },                              // DIT model quantization scheme
    "t5_quantized": true,           // Enable T5 quantization
    "t5_quant_scheme": "fp8",       // T5 quantization mode
    "clip_quantized": true,         // Enable CLIP quantization
    "clip_quant_scheme": "fp8"      // CLIP quantization mode
}
```
gushiqiao's avatar
gushiqiao committed
116

gushiqiao's avatar
gushiqiao committed
117
> **Important Note**: Quantization configurations for each model can be flexibly combined. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.
gushiqiao's avatar
gushiqiao committed
118

gushiqiao's avatar
gushiqiao committed
119
120
121
122
123
124
125
126
127
128
129
130
For detailed explanation of quantization technology, please refer to the [Quantization Documentation](../method_tutorials/quantization.md).

Use the provided bash script for quick startup:

```bash
cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh
```

##### Gradio Interface Startup

When performing inference through the Gradio interface, simply specify the model root directory path at startup, and lightweight VAE can be flexibly selected through frontend interface buttons:
gushiqiao's avatar
gushiqiao committed
131
132

```bash
gushiqiao's avatar
gushiqiao committed
133
134
135
# Image-to-video inference (I2V)
python gradio_demo.py \
    --model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
gushiqiao's avatar
gushiqiao committed
136
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
    --task i2v \
    --model_cls wan2.1_distill
```

### Method 2: Selective Download

**Advantage**: Only download the required versions (quantized or non-quantized), effectively saving storage space and download time

#### 1. Selective Download

```bash
# Use Hugging Face CLI to selectively download non-quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --include "original/*"
```

```bash
# Use Hugging Face CLI to selectively download FP8 quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --include "fp8/*"
```

```bash
# Use Hugging Face CLI to selectively download INT8 quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --include "int8/*"
```
gushiqiao's avatar
gushiqiao committed
167

gushiqiao's avatar
gushiqiao committed
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
> **Important Note**: When starting inference scripts or Gradio, the `model_path` parameter still needs to be specified as the complete path without the `--include` parameter. For example: `model_path=./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, not `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/int8`.

#### 2. Start Inference

**Taking the model with only FP8 version downloaded as an example:**

##### Bash Script Startup

###### Scenario 1: Using FP8 DIT + FP8 T5 + FP8 CLIP

Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`, and set `lightx2v_path` to your LightX2V project path.

Only need to modify the quantized model configuration in the configuration file as follows:
```json
{
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
    },                              // DIT quantization scheme
    "t5_quantized": true,           // Whether to use T5 quantized version
    "t5_quant_scheme": "fp8",       // T5 quantization mode
    "clip_quantized": true,         // Whether to use CLIP quantized version
    "clip_quant_scheme": "fp8",     // CLIP quantization mode
}
```

> **Important Note**: At this time, each model can only be specified as a quantized version. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.

###### Scenario 2: Using FP8 DIT + Original Precision T5 + Original Precision CLIP

Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, and set `lightx2v_path` to your LightX2V project path.

Since only quantized weights were downloaded, you need to manually download the original precision versions of T5 and CLIP, and configure them in the configuration file's `t5_original_ckpt` and `clip_original_ckpt` as follows:
```json
{
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
    },                              // DIT quantization scheme
    "t5_original_ckpt": "/path/to/models_t5_umt5-xxl-enc-bf16.pth",
    "clip_original_ckpt": "/path/to/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth"
}
```

Use the provided bash script for quick startup:

```bash
cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh
```

##### Gradio Interface Startup

When performing inference through the Gradio interface, specify the model root directory path at startup:

```bash
# Image-to-video inference (I2V)
python gradio_demo.py \
    --model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/ \
gushiqiao's avatar
gushiqiao committed
225
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
    --task i2v \
    --model_cls wan2.1_distill
```

> **Important Note**: Since the model root directory only contains quantized versions of each model, when using the frontend, the quantization precision for DIT/T5/CLIP models can only be selected as fp8. If you need to use non-quantized versions of T5/CLIP, please manually download non-quantized weights and place them in the gradio_demo model_path directory (`./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`). In this case, the T5/CLIP quantization precision can be set to bf16/fp16.

### Method 3: Manual Configuration

Users can flexibly configure quantization options and paths for each component according to actual needs, achieving mixed use of quantized and non-quantized components. Please ensure that the required model weights have been correctly downloaded and placed in the specified paths.

#### DIT Model Configuration

```json
{
    "dit_quantized_ckpt": "/path/to/dit_quantized_ckpt",    // DIT quantized weights path
    "dit_original_ckpt": "/path/to/dit_original_ckpt",      // DIT original precision weights path
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"  // DIT matrix multiplication operator type, specify as "Default" when not quantized
    }
}
gushiqiao's avatar
gushiqiao committed
246
247
```

gushiqiao's avatar
gushiqiao committed
248
#### T5 Model Configuration
gushiqiao's avatar
gushiqiao committed
249

gushiqiao's avatar
gushiqiao committed
250
251
252
253
254
255
256
257
```json
{
    "t5_quantized_ckpt": "/path/to/t5_quantized_ckpt",      // T5 quantized weights path
    "t5_original_ckpt": "/path/to/t5_original_ckpt",        // T5 original precision weights path
    "t5_quantized": true,                                   // Whether to enable T5 quantization
    "t5_quant_scheme": "fp8"                                // T5 quantization mode, only effective when t5_quantized is true
}
```
gushiqiao's avatar
gushiqiao committed
258

gushiqiao's avatar
gushiqiao committed
259
#### CLIP Model Configuration
gushiqiao's avatar
gushiqiao committed
260

gushiqiao's avatar
gushiqiao committed
261
262
263
264
265
266
267
268
```json
{
    "clip_quantized_ckpt": "/path/to/clip_quantized_ckpt",  // CLIP quantized weights path
    "clip_original_ckpt": "/path/to/clip_original_ckpt",    // CLIP original precision weights path
    "clip_quantized": true,                                 // Whether to enable CLIP quantization
    "clip_quant_scheme": "fp8"                              // CLIP quantization mode, only effective when clip_quantized is true
}
```
gushiqiao's avatar
gushiqiao committed
269

gushiqiao's avatar
gushiqiao committed
270
#### VAE Model Configuration
gushiqiao's avatar
gushiqiao committed
271

gushiqiao's avatar
gushiqiao committed
272
273
274
275
276
277
278
```json
{
    "vae_pth": "/path/to/Wan2.1_VAE.pth",                   // Original VAE model path
    "use_tiny_vae": true,                                   // Whether to use lightweight VAE
    "tiny_vae_path": "/path/to/taew2_1.pth"                 // Lightweight VAE model path
}
```
gushiqiao's avatar
gushiqiao committed
279

gushiqiao's avatar
gushiqiao committed
280
281
282
283
> **Configuration Notes**:
> - Quantized weights and original precision weights can be flexibly mixed and used, and the system will automatically select the corresponding model based on the configuration
> - The choice of quantization mode depends on your hardware support, it is recommended to use FP8 on high-end GPUs like H100/A100
> - Lightweight VAE can significantly improve inference speed but may slightly affect generation quality
gushiqiao's avatar
gushiqiao committed
284
285
286

## 💡 Best Practices

gushiqiao's avatar
gushiqiao committed
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
### Recommended Configurations

**Complete Model Users**:
- Download complete models to enjoy the convenience of automatic path discovery
- Only need to configure quantization schemes and component switches
- Recommended to use bash scripts for quick startup

**Storage Space Limited Users**:
- Selectively download required quantized versions
- Flexibly mix and use quantized and original precision components
- Use bash scripts to simplify startup process

**Advanced Users**:
- Completely manual path configuration for maximum flexibility
- Support scattered storage of model files
- Can customize bash script parameters

### Performance Optimization Recommendations

gushiqiao's avatar
gushiqiao committed
306
- **Use SSD Storage**: Significantly improve model loading speed and inference performance
gushiqiao's avatar
gushiqiao committed
307
308
309
310
311
- **Choose Appropriate Quantization Schemes**:
  - FP8: Suitable for high-end GPUs like H100/A100, high precision
  - INT8: Suitable for general GPUs, small memory footprint
- **Enable Lightweight VAE**: `use_tiny_vae: true` can improve inference speed
- **Reasonable CPU Offload Configuration**: `t5_cpu_offload: true` can save GPU memory
gushiqiao's avatar
gushiqiao committed
312

gushiqiao's avatar
gushiqiao committed
313
### Download Optimization Recommendations
gushiqiao's avatar
gushiqiao committed
314

gushiqiao's avatar
gushiqiao committed
315
316
317
318
- **Use Hugging Face CLI**: More stable than git clone, supports resume download
- **Selective Download**: Only download required quantized versions, saving time and storage space
- **Network Optimization**: Use stable network connections, use proxy when necessary
- **Resume Download**: Use `--resume-download` parameter to support continuing download after interruption
gushiqiao's avatar
gushiqiao committed
319

gushiqiao's avatar
gushiqiao committed
320
## 🚨 Frequently Asked Questions
gushiqiao's avatar
gushiqiao committed
321

gushiqiao's avatar
gushiqiao committed
322
323
### Q: Model files are too large and download speed is slow, what should I do?
A: It is recommended to use selective download method, only download required quantized versions, or use domestic mirror sources
gushiqiao's avatar
gushiqiao committed
324

gushiqiao's avatar
gushiqiao committed
325
326
327
328
329
330
331
332
### Q: Model path does not exist when starting up?
A: Please check if the model has been correctly downloaded, verify if the path configuration is correct, and confirm if the automatic discovery mechanism is working properly

### Q: How to switch between different quantization schemes?
A: Modify parameters such as `mm_type`, `t5_quant_scheme`, `clip_quant_scheme` in the configuration file, please refer to the [Quantization Documentation](../method_tutorials/quantization.md)

### Q: How to mix and use quantized and original precision components?
A: Control through `t5_quantized` and `clip_quantized` parameters, and manually specify original precision paths
gushiqiao's avatar
gushiqiao committed
333
334

### Q: How to set paths in configuration files?
gushiqiao's avatar
gushiqiao committed
335
336
337
338
339
340
341
A: It is recommended to use automatic path discovery, for manual configuration please refer to the "Manual Configuration" section

### Q: How to verify if automatic path discovery is working properly?
A: Check the startup logs, the code will output the actual model paths being used

### Q: What should I do if bash script startup fails?
A: Check if the path configuration in the script is correct, ensure that `lightx2v_path` and `model_path` variables are correctly set
gushiqiao's avatar
gushiqiao committed
342
343
344

## 📚 Related Links

gushiqiao's avatar
gushiqiao committed
345
- [LightX2V Official Model Repository](https://huggingface.co/lightx2v)
gushiqiao's avatar
gushiqiao committed
346
- [Gradio Deployment Guide](./deploy_gradio.md)
gushiqiao's avatar
gushiqiao committed
347
- [Configuration File Examples](https://github.com/ModelTC/LightX2V/tree/main/configs)
gushiqiao's avatar
gushiqiao committed
348
349
350

---

gushiqiao's avatar
gushiqiao committed
351
Through scientific model file organization and flexible configuration options, LightX2V supports multiple usage scenarios. Complete model download provides maximum convenience, selective download saves storage space, and manual configuration provides maximum flexibility. The automatic path discovery mechanism ensures that users do not need to remember complex path configurations while maintaining system scalability.