> This document showcases the performance test results of LightX2V across different hardware environments, including detailed comparison data for H200 and RTX 4090 platforms.
| **Wan2.1 Official** | Based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1) original implementation |
| **FastVideo** | Based on [FastVideo official repository](https://github.com/hao-ai-lab/FastVideo), using SageAttention2 backend optimization |
| **LightX2V_1** | Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision |
| **LightX2V_2** | Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality |
| **LightX2V_3** | Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage |
| **LightX2V_3-Distill** | Based on LightX2V_3 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
| **LightX2V_4** | Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping |
| **Wan2GP(profile=3)** | Implementation based on [Wan2GP repository](https://github.com/deepbeepmeep/Wan2GP), using MMGP optimization technology. Profile=3 configuration is suitable for RTX 3090/4090 environments with at least 32GB RAM and 24GB VRAM, adapting to limited memory resources by sacrificing VRAM. Uses quantized models: [480P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors) and [720P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors) |
| **LightX2V_5** | Uses SageAttention2 to replace native attention mechanism, adopts DIT FP8+FP32 (partial sensitive layers) mixed precision computation, enables CPU offload technology, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity |
| **LightX2V_5-Distill** | Based on LightX2V_5 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
| **LightX2V_6** | Based on LightX2V_3 with CPU offload technology enabled, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity |
| **LightX2V_6-Distill** | Based on LightX2V_6 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
---
*Coming soon...*
## 📁 Configuration Files Reference
### 720P 5s Video
Benchmark-related configuration files and execution scripts are available at:
*Coming soon...*
| Type | Link | Description |
|:-----|:-----|:------------|
| **Configuration Files** | [configs/bench](https://github.com/ModelTC/LightX2V/tree/main/configs/bench) | Contains JSON files with various optimization configurations |
> 💡 **Tip**: It is recommended to choose the appropriate optimization solution based on your hardware configuration to achieve the best performance.
-**Wan2.1 Official**: Based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1)
-**FastVideo**: Based on [FastVideo official repository](https://github.com/hao-ai-lab/FastVideo), using SageAttention backend
-**LightX2V_1**: Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision
-**LightX2V_2**: Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality
-**LightX2V_3**: Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage
-**LightX2V_3-Distill**: Based on LightX2V_3 using 4-step distillation model(`infer_step=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality.
-**LightX2V_4**: Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping
-**Configuration Files Reference**: Benchmark-related configuration files and execution scripts are available at:
-[Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/bench) - Contains JSON files with various optimization configurations
Welcome to LightX2V! This guide will help you quickly set up the environment and start using LightX2V for video generation.
We recommend using a docker environment. Here is the [dockerhub](https://hub.docker.com/r/lightx2v/lightx2v/tags) for lightx2v. Please select the tag with the latest date, for example, 25061301.
-**Operating System**: Linux (Ubuntu 18.04+) or Windows 10/11
-**Python**: 3.10 or higher
-**GPU**: NVIDIA GPU with CUDA support, at least 8GB VRAM
-**Memory**: 16GB or more recommended
-**Storage**: At least 50GB available space
## 🐧 Linux Environment Setup
### 🐳 Docker Environment (Recommended)
We strongly recommend using the Docker environment, which is the simplest and fastest installation method.
#### 1. Pull Image
Visit LightX2V's [Docker Hub](https://hub.docker.com/r/lightx2v/lightx2v/tags) and select a tag with the latest date, such as `25061301`:
```bash
# Pull the latest version of LightX2V image
docker pull lightx2v/lightx2v:25061301
docker pull lightx2v/lightx2v:25061301
docker run --gpus all -itd--ipc=host --name[container_name] -v[mount_settings] --entrypoint /bin/bash [image_id]
```
```
If you want to set up the environment yourself using conda, you can refer to the following steps:
#### 2. Run Container
```bash
docker run --gpus all -itd--ipc=host --name[container_name] -v[mount_settings] --entrypoint /bin/bash [image_id]
```
#### 3. Domestic Mirror Source (Optional)
For users in mainland China, if the network is unstable when pulling images, you can pull from [Duduniao](https://docker.aityp.com/r/docker.io/lightx2v/lightx2v):
> 💡 **Note**: The Hunyuan model needs to run under transformers version 4.45.2. If you don't need to run the Hunyuan model, you can skip the transformers version restriction.
# The Hunyuan model needs to run under this version of transformers. If you do not need to run the Hunyuan model, you can ignore this step.
> ⚠️ **Note**: SageAttention's CUDA version doesn't need to be strictly aligned, but Python and PyTorch versions must match.
#### Step 6: Clone Repository
```cmd
# Clone project code
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
# Install Windows-specific dependencies
pip install -r requirements_win.txt
```
## 🎯 Inference Usage
### 📥 Model Preparation
Before starting inference, you need to download the model files in advance. We recommend:
-**Download Source**: Download models from [LightX2V Official Hugging Face](https://huggingface.co/lightx2v/) or other open-source model repositories
-**Storage Location**: It's recommended to store models on SSD disks for better read performance
-**Available Models**: Including Wan2.1-I2V, Wan2.1-T2V, and other models supporting different resolutions and functionalities
### 📁 Configuration Files and Scripts
The configuration files used for inference are available [here](https://github.com/ModelTC/LightX2V/tree/main/configs), and scripts are available [here](https://github.com/ModelTC/LightX2V/tree/main/scripts).
You need to configure the downloaded model path in the run script. In addition to the input arguments in the script, there are also some necessary parameters in the configuration file specified by `--config_json`. You can modify them as needed.
### 🚀 Start Inference
#### Linux Environment
```bash
# Run after modifying the path in the script
bash scripts/wan/run_wan_t2v.sh
bash scripts/wan/run_wan_t2v.sh
```
```
In addition to the existing input arguments in the script, there are also some necessary parameters in the `wan_t2v.json` file specified by `--config_json`. You can modify them as needed.
#### Windows Environment
```cmd
# Use Windows batch script
scripts\win\run_wan_t2v.bat
```
## 📞 Get Help
If you encounter problems during installation or usage, please:
1. Search for related issues in [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
2. Submit a new Issue describing your problem
---
🎉 **Congratulations!** You have successfully set up the LightX2V environment and can now start enjoying video generation!
Changing resolution inference is a technical strategy for optimizing the denoising process. It improves computational efficiency while maintaining generation quality by using different resolutions at different denoising stages. The core idea is to use lower resolution for rough denoising in the early stages of the denoising process, then switch to normal resolution for fine-tuning in the later stages.
Variable resolution inference is a technical strategy for optimizing the denoising process. It improves computational efficiency while maintaining generation quality by using different resolutions at different stages of the denoising process. The core idea of this method is to use lower resolution for coarse denoising in the early stages and switch to normal resolution for fine processing in the later stages.
## Technical Principles
## Technical Principles
### Phased Denoising Strategy
### Multi-stage Denoising Strategy
Changing resolution inference is based on the following observations:
Variable resolution inference is based on the following observations:
-**Early-stage denoising**: Mainly processes rough noise and overall structure, doesn't require excessive detail information
-**Late-stage denoising**: Focuses on detail optimization and high-frequency information recovery, requires complete resolution information
-**Early-stage denoising**: Mainly handles coarse noise and overall structure, requiring less detailed information
-**Late-stage denoising**: Focuses on detail optimization and high-frequency information recovery, requiring complete resolution information
### Resolution Switching Mechanism
### Resolution Switching Mechanism
1.**Low Resolution Stage** (Early stage)
1.**Low-resolution stage** (early stage)
- Downsample the input to lower resolution (e.g., 0.75 of original size)
- Downsample the input to a lower resolution (e.g., 0.75x of original size)
- Execute initial denoising steps
- Execute initial denoising steps
- Quickly remove most noise and establish basic structure
- Quickly remove most noise and establish basic structure
2.**Normal Resolution Stage** (Late stage)
2.**Normal resolution stage** (late stage)
- Upsample the denoising result from the first step back to original resolution
- Upsample the denoising result from the first step back to original resolution
- Continue executing remaining denoising steps
- Continue executing remaining denoising steps
- Recover detail information and complete fine-tuning
- Restore detailed information and complete fine processing
### U-shaped Resolution Strategy
If resolution is reduced at the very beginning of the denoising steps, it may cause significant differences between the final generated video and the video generated through normal inference. Therefore, a U-shaped resolution strategy can be adopted, where the original resolution is maintained for the first few steps, then resolution is reduced for inference.
## Usage
## Usage
The config files for changing resolution inference are available [here](https://github.com/ModelTC/LightX2V/tree/main/configs/changing_resolution)
The config files for variable resolution inference are located [here](https://github.com/ModelTC/LightX2V/tree/main/configs/changing_resolution)
You can test variable resolution inference by specifying --config_json to the specific config file.
You can refer to the scripts [here](https://github.com/ModelTC/LightX2V/blob/main/scripts/changing_resolution) to run.
### Example 1:
```
{
"infer_steps": 50,
"changing_resolution": true,
"resolution_rate": [0.75],
"changing_resolution_steps": [25]
}
```
This means a total of 50 steps, with resolution at 0.75x original resolution from step 0 to 25, and original resolution from step 26 to the final step.
By specifying --config_json to the specific config file, you can test changing resolution inference.
### Example 2:
```
{
"infer_steps": 50,
"changing_resolution": true,
"resolution_rate": [1.0, 0.75],
"changing_resolution_steps": [10, 35]
}
```
You can refer to [this script](https://github.com/ModelTC/LightX2V/blob/main/scripts/wan/run_wan_t2v_changing_resolution.sh).
This means a total of 50 steps, with original resolution from step 0 to 10, 0.75x original resolution from step 11 to 35, and original resolution from step 36 to the final step.