model_structure.md 19 KB
Newer Older
gushiqiao's avatar
gushiqiao committed
1
# Model Structure Guide
gushiqiao's avatar
gushiqiao committed
2
3
4

## 📖 Overview

gushiqiao's avatar
gushiqiao committed
5
This document provides a comprehensive introduction to the model directory structure of the LightX2V project, designed to help users efficiently organize model files and achieve a convenient user experience. Through scientific directory organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters. Meanwhile, the system also supports flexible manual path configuration to meet the diverse needs of different user groups.
gushiqiao's avatar
gushiqiao committed
6
7
8

## 🗂️ Model Directory Structure

gushiqiao's avatar
gushiqiao committed
9
### LightX2V Official Model List
gushiqiao's avatar
gushiqiao committed
10

gushiqiao's avatar
gushiqiao committed
11
View all available models: [LightX2V Official Model Repository](https://huggingface.co/lightx2v)
gushiqiao's avatar
gushiqiao committed
12
13
14

### Standard Directory Structure

gushiqiao's avatar
gushiqiao committed
15
Using `Wan2.1-I2V-14B-480P-LightX2V` as an example, the standard file structure is as follows:
gushiqiao's avatar
gushiqiao committed
16
17

```
gushiqiao's avatar
gushiqiao committed
18
Wan2.1-I2V-14B-480P-LightX2V/
gushiqiao's avatar
gushiqiao committed
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
├── fp8/                                          # FP8 quantized version (DIT/T5/CLIP)
│   ├── block_xx.safetensors                      # DIT model FP8 quantized version
│   ├── models_t5_umt5-xxl-enc-fp8.pth            # T5 encoder FP8 quantized version
│   ├── clip-fp8.pth                              # CLIP encoder FP8 quantized version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
├── int8/                                         # INT8 quantized version (DIT/T5/CLIP)
│   ├── block_xx.safetensors                      # DIT model INT8 quantized version
│   ├── models_t5_umt5-xxl-enc-int8.pth           # T5 encoder INT8 quantized version
│   ├── clip-int8.pth                             # CLIP encoder INT8 quantized version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
├── original/                                     # Original precision version (DIT/T5/CLIP)
gushiqiao's avatar
gushiqiao committed
34
35
36
37
38
39
40
41
42
43
44
45
│   ├── xx.safetensors                            # DIT model original precision version
│   ├── models_t5_umt5-xxl-enc-bf16.pth           # T5 encoder original precision version
│   ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth  # CLIP encoder original precision version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
```

Using `Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V` as an example, the standard file structure is as follows:

```
Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/
GoatWu's avatar
GoatWu committed
46
├── distill_fp8/                                  # FP8 quantized version (DIT/T5/CLIP)
gushiqiao's avatar
gushiqiao committed
47
48
49
50
51
52
│   ├── block_xx.safetensors                      # DIT model FP8 quantized version
│   ├── models_t5_umt5-xxl-enc-fp8.pth            # T5 encoder FP8 quantized version
│   ├── clip-fp8.pth                              # CLIP encoder FP8 quantized version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
GoatWu's avatar
GoatWu committed
53
├── distill_int8/                                 # INT8 quantized version (DIT/T5/CLIP)
gushiqiao's avatar
gushiqiao committed
54
55
56
57
58
59
│   ├── block_xx.safetensors                      # DIT model INT8 quantized version
│   ├── models_t5_umt5-xxl-enc-int8.pth           # T5 encoder INT8 quantized version
│   ├── clip-int8.pth                             # CLIP encoder INT8 quantized version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
GoatWu's avatar
GoatWu committed
60
├── distill_models/                               # Original precision version (DIT/T5/CLIP)
gushiqiao's avatar
gushiqiao committed
61
62
63
64
65
66
│   ├── distill_model.safetensors                 # DIT model original precision version
│   ├── models_t5_umt5-xxl-enc-bf16.pth           # T5 encoder original precision version
│   ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth  # CLIP encoder original precision version
│   ├── Wan2.1_VAE.pth                            # VAE variational autoencoder
│   ├── taew2_1.pth                               # Lightweight VAE (optional)
│   └── config.json                               # Model configuration file
GoatWu's avatar
GoatWu committed
67
68
├── loras/
│   ├── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors  # Distillation model lora
gushiqiao's avatar
gushiqiao committed
69
70
71
72
```

### 💾 Storage Recommendations

gushiqiao's avatar
gushiqiao committed
73
**It is strongly recommended to store model files on SSD solid-state drives**, as this can significantly improve model loading speed and inference performance.
gushiqiao's avatar
gushiqiao committed
74
75
76
77
78
79
80
81

**Recommended storage paths**:
```bash
/mnt/ssd/models/          # Independent SSD mount point
/data/ssd/models/         # Data SSD directory
/opt/models/              # System optimization directory
```

gushiqiao's avatar
gushiqiao committed
82
### Quantization Version Description
gushiqiao's avatar
gushiqiao committed
83

gushiqiao's avatar
gushiqiao committed
84
85
86
87
Each model includes multiple quantized versions to adapt to different hardware configuration requirements:
- **FP8 Version**: Suitable for GPUs that support FP8 (such as H100, A100, RTX 40 series), providing optimal performance
- **INT8 Version**: Suitable for most GPUs, balancing performance and compatibility, reducing memory usage by approximately 50%
- **Original Precision Version**: Suitable for applications with extremely high precision requirements, providing highest quality output
gushiqiao's avatar
gushiqiao committed
88

gushiqiao's avatar
gushiqiao committed
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
## 🚀 Usage Methods

### Environment Setup

#### Installing Hugging Face CLI

Before starting to download models, please ensure that Hugging Face CLI is properly installed:

```bash
# Install huggingface_hub
pip install huggingface_hub

# Or install huggingface-cli
pip install huggingface-cli

# Login to Hugging Face (optional, but strongly recommended)
huggingface-cli login
gushiqiao's avatar
gushiqiao committed
106
```
gushiqiao's avatar
gushiqiao committed
107
108
109
110
111
112
113
114
115
116
117

### Method 1: Complete Model Download (Recommended)

**Advantage**: After downloading the complete model, the system will automatically identify all component paths without manual configuration, providing a more convenient user experience

#### 1. Download Complete Model

```bash
# Use Hugging Face CLI to download complete model
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V
gushiqiao's avatar
gushiqiao committed
118
119
```

gushiqiao's avatar
gushiqiao committed
120
#### 2. Start Inference
gushiqiao's avatar
gushiqiao committed
121

gushiqiao's avatar
gushiqiao committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
##### Bash Script Startup

###### Scenario 1: Using Full Precision Model

Modify the configuration in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh):
- `model_path`: Set to the downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`
- `lightx2v_path`: Set to the LightX2V project root directory path

###### Scenario 2: Using Quantized Model

When using the complete model, if you need to enable quantization, please add the following configuration to the [configuration file](https://github.com/ModelTC/LightX2V/tree/main/configs/distill/wan_i2v_distill_4step_cfg.json):

```json
{
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
    },                              // DIT model quantization scheme
    "t5_quantized": true,           // Enable T5 quantization
    "t5_quant_scheme": "fp8",       // T5 quantization mode
    "clip_quantized": true,         // Enable CLIP quantization
    "clip_quant_scheme": "fp8"      // CLIP quantization mode
}
```
gushiqiao's avatar
gushiqiao committed
145

gushiqiao's avatar
gushiqiao committed
146
> **Important Note**: Quantization configurations for each model can be flexibly combined. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.
gushiqiao's avatar
gushiqiao committed
147

gushiqiao's avatar
gushiqiao committed
148
149
150
151
152
153
154
155
156
157
158
159
For detailed explanation of quantization technology, please refer to the [Quantization Documentation](../method_tutorials/quantization.md).

Use the provided bash script for quick startup:

```bash
cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh
```

##### Gradio Interface Startup

When performing inference through the Gradio interface, simply specify the model root directory path at startup, and lightweight VAE can be flexibly selected through frontend interface buttons:
gushiqiao's avatar
gushiqiao committed
160
161

```bash
gushiqiao's avatar
gushiqiao committed
162
163
164
# Image-to-video inference (I2V)
python gradio_demo.py \
    --model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
gushiqiao's avatar
gushiqiao committed
165
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
166
167
168
169
170
171
172
173
174
175
176
177
178
179
    --task i2v \
    --model_cls wan2.1_distill
```

### Method 2: Selective Download

**Advantage**: Only download the required versions (quantized or non-quantized), effectively saving storage space and download time

#### 1. Selective Download

```bash
# Use Hugging Face CLI to selectively download non-quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
gushiqiao's avatar
gushiqiao committed
180
    --include "distill_models/*"
gushiqiao's avatar
gushiqiao committed
181
182
183
184
185
186
```

```bash
# Use Hugging Face CLI to selectively download FP8 quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
gushiqiao's avatar
gushiqiao committed
187
    --include "distill_fp8/*"
gushiqiao's avatar
gushiqiao committed
188
189
190
191
192
193
```

```bash
# Use Hugging Face CLI to selectively download INT8 quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
    --local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
gushiqiao's avatar
gushiqiao committed
194
    --include "distill_int8/*"
gushiqiao's avatar
gushiqiao committed
195
```
gushiqiao's avatar
gushiqiao committed
196

gushiqiao's avatar
gushiqiao committed
197
> **Important Note**: When starting inference scripts or Gradio, the `model_path` parameter still needs to be specified as the complete path without the `--include` parameter. For example: `model_path=./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, not `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/distill_int8`.
gushiqiao's avatar
gushiqiao committed
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253

#### 2. Start Inference

**Taking the model with only FP8 version downloaded as an example:**

##### Bash Script Startup

###### Scenario 1: Using FP8 DIT + FP8 T5 + FP8 CLIP

Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`, and set `lightx2v_path` to your LightX2V project path.

Only need to modify the quantized model configuration in the configuration file as follows:
```json
{
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
    },                              // DIT quantization scheme
    "t5_quantized": true,           // Whether to use T5 quantized version
    "t5_quant_scheme": "fp8",       // T5 quantization mode
    "clip_quantized": true,         // Whether to use CLIP quantized version
    "clip_quant_scheme": "fp8",     // CLIP quantization mode
}
```

> **Important Note**: At this time, each model can only be specified as a quantized version. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.

###### Scenario 2: Using FP8 DIT + Original Precision T5 + Original Precision CLIP

Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, and set `lightx2v_path` to your LightX2V project path.

Since only quantized weights were downloaded, you need to manually download the original precision versions of T5 and CLIP, and configure them in the configuration file's `t5_original_ckpt` and `clip_original_ckpt` as follows:
```json
{
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
    },                              // DIT quantization scheme
    "t5_original_ckpt": "/path/to/models_t5_umt5-xxl-enc-bf16.pth",
    "clip_original_ckpt": "/path/to/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth"
}
```

Use the provided bash script for quick startup:

```bash
cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh
```

##### Gradio Interface Startup

When performing inference through the Gradio interface, specify the model root directory path at startup:

```bash
# Image-to-video inference (I2V)
python gradio_demo.py \
    --model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/ \
gushiqiao's avatar
gushiqiao committed
254
    --model_size 14b \
gushiqiao's avatar
gushiqiao committed
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
    --task i2v \
    --model_cls wan2.1_distill
```

> **Important Note**: Since the model root directory only contains quantized versions of each model, when using the frontend, the quantization precision for DIT/T5/CLIP models can only be selected as fp8. If you need to use non-quantized versions of T5/CLIP, please manually download non-quantized weights and place them in the gradio_demo model_path directory (`./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`). In this case, the T5/CLIP quantization precision can be set to bf16/fp16.

### Method 3: Manual Configuration

Users can flexibly configure quantization options and paths for each component according to actual needs, achieving mixed use of quantized and non-quantized components. Please ensure that the required model weights have been correctly downloaded and placed in the specified paths.

#### DIT Model Configuration

```json
{
    "dit_quantized_ckpt": "/path/to/dit_quantized_ckpt",    // DIT quantized weights path
    "dit_original_ckpt": "/path/to/dit_original_ckpt",      // DIT original precision weights path
    "mm_config": {
        "mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"  // DIT matrix multiplication operator type, specify as "Default" when not quantized
    }
}
gushiqiao's avatar
gushiqiao committed
275
276
```

gushiqiao's avatar
gushiqiao committed
277
#### T5 Model Configuration
gushiqiao's avatar
gushiqiao committed
278

gushiqiao's avatar
gushiqiao committed
279
280
281
282
283
284
285
286
```json
{
    "t5_quantized_ckpt": "/path/to/t5_quantized_ckpt",      // T5 quantized weights path
    "t5_original_ckpt": "/path/to/t5_original_ckpt",        // T5 original precision weights path
    "t5_quantized": true,                                   // Whether to enable T5 quantization
    "t5_quant_scheme": "fp8"                                // T5 quantization mode, only effective when t5_quantized is true
}
```
gushiqiao's avatar
gushiqiao committed
287

gushiqiao's avatar
gushiqiao committed
288
#### CLIP Model Configuration
gushiqiao's avatar
gushiqiao committed
289

gushiqiao's avatar
gushiqiao committed
290
291
292
293
294
295
296
297
```json
{
    "clip_quantized_ckpt": "/path/to/clip_quantized_ckpt",  // CLIP quantized weights path
    "clip_original_ckpt": "/path/to/clip_original_ckpt",    // CLIP original precision weights path
    "clip_quantized": true,                                 // Whether to enable CLIP quantization
    "clip_quant_scheme": "fp8"                              // CLIP quantization mode, only effective when clip_quantized is true
}
```
gushiqiao's avatar
gushiqiao committed
298

gushiqiao's avatar
gushiqiao committed
299
#### VAE Model Configuration
gushiqiao's avatar
gushiqiao committed
300

gushiqiao's avatar
gushiqiao committed
301
302
303
304
305
306
307
```json
{
    "vae_pth": "/path/to/Wan2.1_VAE.pth",                   // Original VAE model path
    "use_tiny_vae": true,                                   // Whether to use lightweight VAE
    "tiny_vae_path": "/path/to/taew2_1.pth"                 // Lightweight VAE model path
}
```
gushiqiao's avatar
gushiqiao committed
308

gushiqiao's avatar
gushiqiao committed
309
310
311
312
> **Configuration Notes**:
> - Quantized weights and original precision weights can be flexibly mixed and used, and the system will automatically select the corresponding model based on the configuration
> - The choice of quantization mode depends on your hardware support, it is recommended to use FP8 on high-end GPUs like H100/A100
> - Lightweight VAE can significantly improve inference speed but may slightly affect generation quality
gushiqiao's avatar
gushiqiao committed
313
314
315

## 💡 Best Practices

gushiqiao's avatar
gushiqiao committed
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
### Recommended Configurations

**Complete Model Users**:
- Download complete models to enjoy the convenience of automatic path discovery
- Only need to configure quantization schemes and component switches
- Recommended to use bash scripts for quick startup

**Storage Space Limited Users**:
- Selectively download required quantized versions
- Flexibly mix and use quantized and original precision components
- Use bash scripts to simplify startup process

**Advanced Users**:
- Completely manual path configuration for maximum flexibility
- Support scattered storage of model files
- Can customize bash script parameters

### Performance Optimization Recommendations

gushiqiao's avatar
gushiqiao committed
335
- **Use SSD Storage**: Significantly improve model loading speed and inference performance
gushiqiao's avatar
gushiqiao committed
336
337
338
339
340
- **Choose Appropriate Quantization Schemes**:
  - FP8: Suitable for high-end GPUs like H100/A100, high precision
  - INT8: Suitable for general GPUs, small memory footprint
- **Enable Lightweight VAE**: `use_tiny_vae: true` can improve inference speed
- **Reasonable CPU Offload Configuration**: `t5_cpu_offload: true` can save GPU memory
gushiqiao's avatar
gushiqiao committed
341

gushiqiao's avatar
gushiqiao committed
342
### Download Optimization Recommendations
gushiqiao's avatar
gushiqiao committed
343

gushiqiao's avatar
gushiqiao committed
344
345
346
347
- **Use Hugging Face CLI**: More stable than git clone, supports resume download
- **Selective Download**: Only download required quantized versions, saving time and storage space
- **Network Optimization**: Use stable network connections, use proxy when necessary
- **Resume Download**: Use `--resume-download` parameter to support continuing download after interruption
gushiqiao's avatar
gushiqiao committed
348

gushiqiao's avatar
gushiqiao committed
349
## 🚨 Frequently Asked Questions
gushiqiao's avatar
gushiqiao committed
350

gushiqiao's avatar
gushiqiao committed
351
352
### Q: Model files are too large and download speed is slow, what should I do?
A: It is recommended to use selective download method, only download required quantized versions, or use domestic mirror sources
gushiqiao's avatar
gushiqiao committed
353

gushiqiao's avatar
gushiqiao committed
354
355
356
357
358
359
360
361
### Q: Model path does not exist when starting up?
A: Please check if the model has been correctly downloaded, verify if the path configuration is correct, and confirm if the automatic discovery mechanism is working properly

### Q: How to switch between different quantization schemes?
A: Modify parameters such as `mm_type`, `t5_quant_scheme`, `clip_quant_scheme` in the configuration file, please refer to the [Quantization Documentation](../method_tutorials/quantization.md)

### Q: How to mix and use quantized and original precision components?
A: Control through `t5_quantized` and `clip_quantized` parameters, and manually specify original precision paths
gushiqiao's avatar
gushiqiao committed
362
363

### Q: How to set paths in configuration files?
gushiqiao's avatar
gushiqiao committed
364
365
366
367
368
369
370
A: It is recommended to use automatic path discovery, for manual configuration please refer to the "Manual Configuration" section

### Q: How to verify if automatic path discovery is working properly?
A: Check the startup logs, the code will output the actual model paths being used

### Q: What should I do if bash script startup fails?
A: Check if the path configuration in the script is correct, ensure that `lightx2v_path` and `model_path` variables are correctly set
gushiqiao's avatar
gushiqiao committed
371
372
373

## 📚 Related Links

gushiqiao's avatar
gushiqiao committed
374
- [LightX2V Official Model Repository](https://huggingface.co/lightx2v)
gushiqiao's avatar
gushiqiao committed
375
- [Gradio Deployment Guide](./deploy_gradio.md)
gushiqiao's avatar
gushiqiao committed
376
- [Configuration File Examples](https://github.com/ModelTC/LightX2V/tree/main/configs)
gushiqiao's avatar
gushiqiao committed
377
378
379

---

gushiqiao's avatar
gushiqiao committed
380
Through scientific model file organization and flexible configuration options, LightX2V supports multiple usage scenarios. Complete model download provides maximum convenience, selective download saves storage space, and manual configuration provides maximum flexibility. The automatic path discovery mechanism ensures that users do not need to remember complex path configurations while maintaining system scalability.