The DiT model in `LightX2V` currently uses three types of attention mechanisms. Each type of attention can be configured with a specific backend library.
## Attention Mechanisms Supported by LightX2V
---
## Attention Usage Locations
1.**Self-Attention on the image**
- Configuration key: `self_attn_1_type`
2.**Cross-Attention between image and prompt text**
- Configuration key: `cross_attn_1_type`
3.**Cross-Attention between image and reference image (in I2V mode)**
Tips: radial_attn can only be used in self attention due to the limitations of its sparse algorithm principle.
For further customization or behavior tuning, please refer to the official documentation of the respective attention libraries.
For further customization of attention mechanism behavior, please refer to the official documentation or implementation code of each attention library.
Step distillation is an important optimization technique in LightX2V. By training distilled models, it significantly reduces inference steps from the original 40-50 steps to **4 steps**, dramatically improving inference speed while maintaining video quality. LightX2V implements step distillation along with CFG distillation to further enhance inference speed.