# 🚀 Benchmark
> This document showcases the performance test results of LightX2V across different hardware environments, including detailed comparison data for H200 and RTX 4090 platforms.
---
## 🖥️ H200 Environment (~140GB VRAM)
### 📋 Software Environment Configuration
| Component | Version |
|:----------|:--------|
| **Python** | 3.11 |
| **PyTorch** | 2.7.1+cu128 |
| **SageAttention** | 2.2.0 |
| **vLLM** | 0.9.2 |
| **sgl-kernel** | 0.1.8 |
---
### 🎬 480P 5s Video Test
**Test Configuration:**
- **Model**: [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
- **Parameters**: `infer_steps=40`, `seed=42`, `enable_cfg=True`
#### 📊 Performance Comparison Table
| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
| **Wan2.1 Official** | 366 | 71 | 1.0x | |
| **FastVideo** | 292 | 26 | **1.25x** | |
| **LightX2V_1** | 250 | 53 | **1.46x** | |
| **LightX2V_2** | 216 | 50 | **1.70x** | |
| **LightX2V_3** | 191 | 35 | **1.92x** | |
| **LightX2V_3-Distill** | 14 | 35 | **🏆 20.85x** | |
| **LightX2V_4** | 107 | 35 | **3.41x** | |
---
### 🎬 720P 5s Video Test
**Test Configuration:**
- **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
- **Parameters**: `infer_steps=40`, `seed=1234`, `enable_cfg=True`
#### 📊 Performance Comparison Table
| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
| **Wan2.1 Official** | 974 | 81 | 1.0x | |
| **FastVideo** | 914 | 40 | **1.07x** | |
| **LightX2V_1** | 807 | 65 | **1.21x** | |
| **LightX2V_2** | 751 | 57 | **1.30x** | |
| **LightX2V_3** | 671 | 43 | **1.45x** | |
| **LightX2V_3-Distill** | 44 | 43 | **🏆 22.14x** | |
| **LightX2V_4** | 344 | 46 | **2.83x** | |
---
## 🖥️ RTX 4090 Environment (~24GB VRAM)
### 📋 Software Environment Configuration
| Component | Version |
|:----------|:--------|
| **Python** | 3.9.16 |
| **PyTorch** | 2.5.1+cu124 |
| **SageAttention** | 2.1.0 |
| **vLLM** | 0.6.6 |
| **sgl-kernel** | 0.0.5 |
| **q8-kernels** | 0.0.0 |
---
### 🎬 480P 5s Video Test
**Test Configuration:**
- **Model**: [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
- **Parameters**: `infer_steps=40`, `seed=42`, `enable_cfg=True`
#### 📊 Performance Comparison Table
| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
| **Wan2GP(profile=3)** | 779 | 20 | **1.0x** | |
| **LightX2V_5** | 738 | 16 | **1.05x** | |
| **LightX2V_5-Distill** | 68 | 16 | **11.45x** | |
| **LightX2V_6** | 630 | 12 | **1.24x** | |
| **LightX2V_6-Distill** | 63 | 12 | **🏆 12.36x** | |
---
### 🎬 720P 5s Video Test
**Test Configuration:**
- **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
- **Parameters**: `infer_steps=40`, `seed=1234`, `enable_cfg=True`
#### 📊 Performance Comparison Table
| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
| **Wan2GP(profile=3)** | -- | OOM | -- | |
| **LightX2V_5** | 2473 | 23 | -- | |
| **LightX2V_5-Distill** | 183 | 23 | -- | |
| **LightX2V_6** | 2169 | 18 | -- | |
| **LightX2V_6-Distill** | 171 | 18 | -- | |
---
## 📖 Configuration Descriptions
### 🖥️ H200 Environment Configuration Descriptions
| Configuration | Technical Features |
|:--------------|:------------------|
| **Wan2.1 Official** | Based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1) original implementation |
| **FastVideo** | Based on [FastVideo official repository](https://github.com/hao-ai-lab/FastVideo), using SageAttention2 backend optimization |
| **LightX2V_1** | Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision |
| **LightX2V_2** | Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality |
| **LightX2V_3** | Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage |
| **LightX2V_3-Distill** | Based on LightX2V_3 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
| **LightX2V_4** | Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping |
### 🖥️ RTX 4090 Environment Configuration Descriptions
| Configuration | Technical Features |
|:--------------|:------------------|
| **Wan2GP(profile=3)** | Implementation based on [Wan2GP repository](https://github.com/deepbeepmeep/Wan2GP), using MMGP optimization technology. Profile=3 configuration is suitable for RTX 3090/4090 environments with at least 32GB RAM and 24GB VRAM, adapting to limited memory resources by sacrificing VRAM. Uses quantized models: [480P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors) and [720P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors) |
| **LightX2V_5** | Uses SageAttention2 to replace native attention mechanism, adopts DIT FP8+FP32 (partial sensitive layers) mixed precision computation, enables CPU offload technology, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity |
| **LightX2V_5-Distill** | Based on LightX2V_5 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
| **LightX2V_6** | Based on LightX2V_3 with CPU offload technology enabled, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity |
| **LightX2V_6-Distill** | Based on LightX2V_6 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
---
## 📁 Configuration Files Reference
Benchmark-related configuration files and execution scripts are available at:
| Type | Link | Description |
|:-----|:-----|:------------|
| **Configuration Files** | [configs/bench](https://github.com/ModelTC/LightX2V/tree/main/configs/bench) | Contains JSON files with various optimization configurations |
| **Execution Scripts** | [scripts/bench](https://github.com/ModelTC/LightX2V/tree/main/scripts/bench) | Contains benchmark execution scripts |
---
> 💡 **Tip**: It is recommended to choose the appropriate optimization solution based on your hardware configuration to achieve the best performance.