The DiT model in `LightX2V` currently uses three types of attention mechanisms. Each type of attention can be configured with a specific backend library.
---
## Attention Usage Locations
1.**Self-Attention on the image**
- Configuration key: `self_attn_1_type`
2.**Cross-Attention between image and prompt text**
- Configuration key: `cross_attn_1_type`
3.**Cross-Attention between image and reference image (in I2V mode)**