README.md 10.5 KB
Newer Older
xuwx1's avatar
xuwx1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
# LightX2V Usage Examples

This document introduces how to use LightX2V for video generation, including basic usage and advanced configurations.

## 📋 Table of Contents

- [Environment Setup](#environment-setup)
- [Basic Usage Examples](#basic-usage-examples)
- [Model Path Configuration](#model-path-configuration)
- [Creating Generator](#creating-generator)
- [Advanced Configurations](#advanced-configurations)
  - [Parameter Offloading](#parameter-offloading)
  - [Model Quantization](#model-quantization)
  - [Parallel Inference](#parallel-inference)
  - [Feature Caching](#feature-caching)
  - [Lightweight VAE](#lightweight-vae)

## 🔧 Environment Setup

Please refer to the main project's [Quick Start Guide](../docs/EN/source/getting_started/quickstart.md) for environment setup.

## 🚀 Basic Usage Examples

A minimal code example can be found in `examples/wan_t2v.py`:

```python
from lightx2v import LightX2VPipeline

pipe = LightX2VPipeline(
    model_path="/path/to/Wan2.1-T2V-14B",
    model_cls="wan2.1",
    task="t2v",
)

pipe.create_generator(
    attn_mode="sage_attn2",
    infer_steps=50,
    height=480,
    width=832,
    num_frames=81,
    guidance_scale=5.0,
    sample_shift=5.0,
)

seed = 42
prompt = "Your prompt here"
negative_prompt = ""
save_result_path="/path/to/save_results/output.mp4"

pipe.generate(
    seed=seed,
    prompt=prompt,
    negative_prompt=negative_prompt,
    save_result_path=save_result_path,
)
```

## 📁 Model Path Configuration

### Basic Configuration

Pass the model path to `LightX2VPipeline`:

```python
pipe = LightX2VPipeline(
    model_path="/path/to/Wan2.2-I2V-A14B",
    model_cls="wan2.2_moe",  # For wan2.1, use "wan2.1"
    task="i2v",
)
```

### Specifying Multiple Model Weight Versions

When there are multiple versions of bf16 precision DIT model safetensors files in the `model_path` directory, you need to use the following parameters to specify which weights to use:

- **`dit_original_ckpt`**: Used to specify the original DIT weight path for models like wan2.1 and hunyuan15
- **`low_noise_original_ckpt`**: Used to specify the low noise branch weight path for wan2.2 models
- **`high_noise_original_ckpt`**: Used to specify the high noise branch weight path for wan2.2 models

**Usage Example:**

```python
pipe = LightX2VPipeline(
    model_path="/path/to/Wan2.2-I2V-A14B",
    model_cls="wan2.2_moe",
    task="i2v",
    low_noise_original_ckpt="/path/to/low_noise_model.safetensors",
    high_noise_original_ckpt="/path/to/high_noise_model.safetensors",
)
```

## 🎛️ Creating Generator

### Loading from Configuration File

The generator can be loaded directly from a JSON configuration file. Configuration files are located in the `configs` directory:

```python
pipe.create_generator(config_json="../configs/wan/wan_t2v.json")
```

### Creating Generator Manually

You can also create the generator manually and configure multiple parameters:

```python
pipe.create_generator(
    attn_mode="flash_attn2",  # Options: flash_attn2, flash_attn3, sage_attn2, sage_attn3 (B-architecture GPUs)
    infer_steps=50,           # Number of inference steps
    num_frames=81,            # Number of video frames
    height=480,               # Video height
    width=832,                # Video width
    guidance_scale=5.0,       # CFG guidance strength (CFG disabled when =1)
    sample_shift=5.0,         # Sample shift
    fps=16,                   # Frame rate
    aspect_ratio="16:9",      # Aspect ratio
    boundary=0.900,           # Boundary value
    boundary_step_index=2,    # Boundary step index
    denoising_step_list=[1000, 750, 500, 250],  # Denoising step list
)
```

**Parameter Description:**
- **Resolution**: Specified via `height` and `width`
- **CFG**: Specified via `guidance_scale` (set to 1 to disable CFG)
- **FPS**: Specified via `fps`
- **Video Length**: Specified via `num_frames`
- **Inference Steps**: Specified via `infer_steps`
- **Sample Shift**: Specified via `sample_shift`
- **Attention Mode**: Specified via `attn_mode`, options include `flash_attn2`, `flash_attn3`, `sage_attn2`, `sage_attn3` (for B-architecture GPUs)

## ⚙️ Advanced Configurations

**⚠️ Important: When manually creating a generator, you can configure some advanced options. All advanced configurations must be specified before `create_generator()`, otherwise they will not take effect!**

### Parameter Offloading

Significantly reduces memory usage with almost no impact on inference speed. Suitable for RTX 30/40/50 series GPUs.

```python
pipe.enable_offload(
    cpu_offload=True,              # Enable CPU offloading
    offload_granularity="block",   # Offload granularity: "block" or "phase"
    text_encoder_offload=False,    # Whether to offload text encoder
    image_encoder_offload=False,   # Whether to offload image encoder
    vae_offload=False,             # Whether to offload VAE
)
```

**Notes:**
- For Wan models, `offload_granularity` supports both `"block"` and `"phase"`
- For HunyuanVideo-1.5, only `"block"` is currently supported

### Model Quantization

Quantization can significantly reduce memory usage and accelerate inference.

```python
pipe.enable_quantize(
    dit_quantized=False,                    # Whether to use quantized DIT model
    text_encoder_quantized=False,           # Whether to use quantized text encoder
    image_encoder_quantized=False,          # Whether to use quantized image encoder
    dit_quantized_ckpt=None,                # DIT quantized weight path (required when model_path doesn't contain quantized weights or has multiple weight files)
    low_noise_quantized_ckpt=None,          # Wan2.2 low noise branch quantized weight path
    high_noise_quantized_ckpt=None,         # Wan2.2 high noise branch quantized weight path
    text_encoder_quantized_ckpt=None,       # Text encoder quantized weight path (required when model_path doesn't contain quantized weights or has multiple weight files)
    image_encoder_quantized_ckpt=None,      # Image encoder quantized weight path (required when model_path doesn't contain quantized weights or has multiple weight files)
    quant_scheme="fp8-sgl",                 # Quantization scheme
)
```

**Parameter Description:**
- **`dit_quantized_ckpt`**: When the `model_path` directory doesn't contain quantized weights, or has multiple weight files, you need to specify the specific DIT quantized weight path
- **`text_encoder_quantized_ckpt`** and **`image_encoder_quantized_ckpt`**: Similarly, used to specify encoder quantized weight paths
- **`low_noise_quantized_ckpt`** and **`high_noise_quantized_ckpt`**: Used to specify dual-branch quantized weights for Wan2.2 models

**Quantized Model Downloads:**

- **Wan-2.1 Quantized Models**: Download from [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- **Wan-2.2 Quantized Models**: Download from [Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
- **HunyuanVideo-1.5 Quantized Models**: Download from [Hy1.5-Quantized-Models](https://huggingface.co/lightx2v/Hy1.5-Quantized-Models)
  - `hy15_qwen25vl_llm_encoder_fp8_e4m3_lightx2v.safetensors` is the quantized weight for the text encoder

**Usage Examples:**

```python
# HunyuanVideo-1.5 Quantization Example
pipe.enable_quantize(
    quant_scheme='fp8-sgl',
    dit_quantized=True,
    dit_quantized_ckpt="/path/to/hy15_720p_i2v_fp8_e4m3_lightx2v.safetensors",
    text_encoder_quantized=True,
    image_encoder_quantized=False,
    text_encoder_quantized_ckpt="/path/to/hy15_qwen25vl_llm_encoder_fp8_e4m3_lightx2v.safetensors",
)

# Wan2.1 Quantization Example
pipe.enable_quantize(
    dit_quantized=True,
    dit_quantized_ckpt="/path/to/wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors",
)

# Wan2.2 Quantization Example
pipe.enable_quantize(
    dit_quantized=True,
    low_noise_quantized_ckpt="/path/to/wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors",
    high_noise_quantized_ckpt="/path/to/wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step_1030.safetensors",
)
```

**Quantization Scheme Reference:** For detailed information, please refer to the [Quantization Documentation](../docs/EN/source/method_tutorials/quantization.md)

### Parallel Inference

Supports multi-GPU parallel inference. Requires running with `torchrun`:

```python
pipe.enable_parallel(
    seq_p_size=4,                    # Sequence parallel size
    seq_p_attn_type="ulysses",       # Sequence parallel attention type
)
```

**Running Method:**
```bash
torchrun --nproc_per_node=4 your_script.py
```

### Feature Caching

You can specify the cache method as Mag or Tea, using MagCache and TeaCache methods:

```python
pipe.enable_cache(
    cache_method='Tea',  # Cache method: 'Tea' or 'Mag'
    coefficients=[-3.08907507e+04, 1.67786188e+04, -3.19178643e+03,
                  2.60740519e+02, -8.19205881e+00, 1.07913775e-01],  # Coefficients
    teacache_thresh=0.15,  # TeaCache threshold
)
```

**Coefficient Reference:** Refer to configuration files in `configs/caching` or `configs/hunyuan_video_15/cache` directories

### Lightweight VAE

Using lightweight VAE can accelerate decoding and reduce memory usage.

```python
pipe.enable_lightvae(
    use_lightvae=False,    # Whether to use LightVAE
    use_tae=False,         # Whether to use LightTAE
    vae_path=None,         # Path to LightVAE
    tae_path=None,         # Path to LightTAE
)
```

**Support Status:**
- **LightVAE**: Currently only supports wan2.1, wan2.2 moe
- **LightTAE**: Currently only supports wan2.1, wan2.2-ti2v, wan2.2 moe, HunyuanVideo-1.5

**Model Downloads:** Lightweight VAE models can be downloaded from [Autoencoders](https://huggingface.co/lightx2v/Autoencoders)

- LightVAE for Wan-2.1: [lightvaew2_1.safetensors](https://huggingface.co/lightx2v/Autoencoders/blob/main/lightvaew2_1.safetensors)
- LightTAE for Wan-2.1: [lighttaew2_1.safetensors](https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaew2_1.safetensors)
- LightTAE for Wan-2.2-ti2v: [lighttaew2_2.safetensors](https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaew2_2.safetensors)
- LightTAE for HunyuanVideo-1.5: [lighttaehy1_5.safetensors](https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaehy1_5.safetensors)

**Usage Example:**

```python
# Using LightTAE for HunyuanVideo-1.5
pipe.enable_lightvae(
    use_tae=True,
    tae_path="/path/to/lighttaehy1_5.safetensors",
    use_lightvae=False,
    vae_path=None
)
```

## 📚 More Resources

- [Full Documentation](https://lightx2v-en.readthedocs.io/en/latest/)
- [GitHub Repository](https://github.com/ModelTC/LightX2V)
- [HuggingFace Model Hub](https://huggingface.co/lightx2v)