Commit 34df26f6 authored by GoatWu's avatar GoatWu
Browse files

Merge branch 'main' of github.com:ModelTC/lightx2v into main

parents 9a53e8e3 a5cb22ce
{
"infer_steps": 40,
"target_video_length": 81,
"target_height": 480,
"target_width": 832,
"self_attn_1_type": "radial_attn",
"cross_attn_1_type": "flash_attn3",
"cross_attn_2_type": "flash_attn3",
"seed": 42,
"sample_guide_scale": 5,
"sample_shift": 5,
"enable_cfg": true,
"cpu_offload": false
}
# 🎯 Attention Type Configuration in DiT Model
# Attention Mechanisms
The DiT model in `LightX2V` currently uses three types of attention mechanisms. Each type of attention can be configured with a specific backend library.
## Attention Mechanisms Supported by LightX2V
---
## Attention Usage Locations
1. **Self-Attention on the image**
- Configuration key: `self_attn_1_type`
2. **Cross-Attention between image and prompt text**
- Configuration key: `cross_attn_1_type`
3. **Cross-Attention between image and reference image (in I2V mode)**
- Configuration key: `cross_attn_2_type`
| Name | Type Name | GitHub Link |
|--------------------|------------------|-------------|
| Flash Attention 2 | `flash_attn2` | [flash-attention v2](https://github.com/Dao-AILab/flash-attention) |
| Flash Attention 3 | `flash_attn3` | [flash-attention v3](https://github.com/Dao-AILab/flash-attention) |
| Sage Attention 2 | `sage_attn2` | [SageAttention](https://github.com/thu-ml/SageAttention) |
| Radial Attention | `radial_attn` | [Radial Attention](https://github.com/mit-han-lab/radial-attention) |
| Sparge Attention | `sparge_ckpt` | [Sparge Attention](https://github.com/thu-ml/SpargeAttn) |
---
## 🚀 Supported Attention Backends
## Configuration Examples
| Name | Type Identifier | GitHub Link |
|--------------------|-------------------|-------------|
| Flash Attention 2 | `flash_attn2` | [flash-attention v2](https://github.com/Dao-AILab/flash-attention) |
| Flash Attention 3 | `flash_attn3` | [flash-attention v3](https://github.com/Dao-AILab/flash-attention) |
| Sage Attention 2 | `sage_attn2` | [SageAttention](https://github.com/thu-ml/SageAttention) |
| Radial Attention | `radial_attn` | [Radial Attention](https://github.com/mit-han-lab/radial-attention) |
| Sparge Attention | `sparge_ckpt` | [Sparge Attention](https://github.com/thu-ml/SpargeAttn) |
---
The configuration files for attention mechanisms are located [here](https://github.com/ModelTC/lightx2v/tree/main/configs/attentions)
## 🛠️ Configuration Example
By specifying --config_json to a specific config file, you can test different attention mechanisms.
In the `wan_i2v.json` configuration file, you can specify the attention types as follows:
For example, for radial_attn, the configuration is as follows:
```json
{
......@@ -41,26 +28,8 @@ In the `wan_i2v.json` configuration file, you can specify the attention types as
}
```
To use other attention backends, simply replace the values with the appropriate type identifiers listed above.
Tip: Due to the limitations of the sparse algorithm's principle, radial_attn can only be used in self-attention.
---
For Sparge Attention like `wan_t2v_sparge.json` configuration file:
Sparge Attention need PostTrain weight path
To switch to other types, simply replace the corresponding values with the type names from the table above.
```json
{
"self_attn_1_type": "flash_attn3",
"cross_attn_1_type": "flash_attn3",
"cross_attn_2_type": "flash_attn3"
"sparge": true,
"sparge_ckpt": "/path/to/sparge_wan2.1_t2v_1.3B.pt"
}
```
---
Tips: radial_attn can only be used in self attention due to the limitations of its sparse algorithm principle.
For further customization or behavior tuning, please refer to the official documentation of the respective attention libraries.
For further customization of attention mechanism behavior, please refer to the official documentation or implementation code of each attention library.
# 🚀 Step Distillation
# Step Distillation
Step distillation is an important optimization technique in LightX2V. By training distilled models, it significantly reduces inference steps from the original 40-50 steps to **4 steps**, dramatically improving inference speed while maintaining video quality. LightX2V implements step distillation along with CFG distillation to further enhance inference speed.
......
# 🎯 DiT 模型中的注意力类型配置说明
# 注意力机制
当前 DiT 模型在 `LightX2V` 中三个地方使用到了注意力,每个注意力可以分别配置底层注意力库类型。
---
## 使用注意力的位置
1. **图像的自注意力(Self-Attention)**
- 配置参数:`self_attn_1_type`
2. **图像与提示词(Text)之间的交叉注意力(Cross-Attention)**
- 配置参数:`cross_attn_1_type`
3. **I2V 模式下图像与参考图(Reference)之间的交叉注意力**
- 配置参数:`cross_attn_2_type`
---
## 🚀 支持的注意力库(Backend)
## LightX2V支持的注意力机制
| 名称 | 类型名称 | GitHub 链接 |
|--------------------|------------------|-------------|
......@@ -29,9 +12,13 @@
---
## 🛠️ 配置示例
## 配置示例
`wan_i2v.json` 配置文件中,可以通过如下方式指定使用的注意力类型:
注意力机制的config文件在[这里](https://github.com/ModelTC/lightx2v/tree/main/configs/attentions)
通过指定--config_json到具体的config文件,即可以测试不同的注意力机制
比如对于radial_attn,配置如下:
```json
{
......@@ -45,22 +32,4 @@
tips: radial_attn因为稀疏算法原理的限制只能用在self attention
---
对于 Sparge Attention 配置参考 `wan_t2v_sparge.json` 文件:
Sparge Attention是需要后一个训练的权重
```json
{
"self_attn_1_type": "flash_attn3",
"cross_attn_1_type": "flash_attn3",
"cross_attn_2_type": "flash_attn3"
"sparge": true,
"sparge_ckpt": "/path/to/sparge_wan2.1_t2v_1.3B.pt"
}
```
---
如需进一步定制注意力机制的行为,请参考各注意力库的官方文档或实现代码。
# 🚀 步数蒸馏
# 步数蒸馏
步数蒸馏是 LightX2V 中的一项重要优化技术,通过训练蒸馏模型将推理步数从原始的 40-50 步大幅减少到 **4 步**,在保持视频质量的同时显著提升推理速度。LightX2V 在实现步数蒸馏的同时也加入了 CFG 蒸馏,进一步提升推理速度。
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment