# 🚀 Benchmark > This document showcases the performance test results of LightX2V across different hardware environments, including detailed comparison data for H200 and RTX 4090 platforms. --- ## 🖥️ H200 Environment (~140GB VRAM) ### 📋 Software Environment Configuration | Component | Version | |:----------|:--------| | **Python** | 3.11 | | **PyTorch** | 2.7.1+cu128 | | **SageAttention** | 2.2.0 | | **vLLM** | 0.9.2 | | **sgl-kernel** | 0.1.8 | --- ### 🎬 480P 5s Video Test **Test Configuration:** - **Model**: [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) - **Parameters**: `infer_steps=40`, `seed=42`, `enable_cfg=True` #### 📊 Performance Comparison Table | Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect | |:-------------|:-----------------:|:--------------:|:-------:|:------------:| | **Wan2.1 Official** | 366 | 71 | 1.0x | | | **FastVideo** | 292 | 26 | **1.25x** | | | **LightX2V_1** | 250 | 53 | **1.46x** | | | **LightX2V_2** | 216 | 50 | **1.70x** | | | **LightX2V_3** | 191 | 35 | **1.92x** | | | **LightX2V_3-Distill** | 14 | 35 | **🏆 20.85x** | | | **LightX2V_4** | 107 | 35 | **3.41x** | | --- ### 🎬 720P 5s Video Test **Test Configuration:** - **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) - **Parameters**: `infer_steps=40`, `seed=1234`, `enable_cfg=True` #### 📊 Performance Comparison Table | Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect | |:-------------|:-----------------:|:--------------:|:-------:|:------------:| | **Wan2.1 Official** | 974 | 81 | 1.0x | | | **FastVideo** | 914 | 40 | **1.07x** | | | **LightX2V_1** | 807 | 65 | **1.21x** | | | **LightX2V_2** | 751 | 57 | **1.30x** | | | **LightX2V_3** | 671 | 43 | **1.45x** | | | **LightX2V_3-Distill** | 44 | 43 | **🏆 22.14x** | | | **LightX2V_4** | 344 | 46 | **2.83x** | | --- ## 🖥️ RTX 4090 Environment (~24GB VRAM) ### 📋 Software Environment Configuration | Component | Version | |:----------|:--------| | **Python** | 3.9.16 | | **PyTorch** | 2.5.1+cu124 | | **SageAttention** | 2.1.0 | | **vLLM** | 0.6.6 | | **sgl-kernel** | 0.0.5 | | **q8-kernels** | 0.0.0 | --- ### 🎬 480P 5s Video Test **Test Configuration:** - **Model**: [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) - **Parameters**: `infer_steps=40`, `seed=42`, `enable_cfg=True` #### 📊 Performance Comparison Table | Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect | | **Wan2GP(profile=3)** | 779 | 20 | **1.0x** | | | **LightX2V_5** | 738 | 16 | **1.05x** | | | **LightX2V_5-Distill** | 68 | 16 | **11.45x** | | | **LightX2V_6** | 630 | 12 | **1.24x** | | | **LightX2V_6-Distill** | 63 | 12 | **🏆 12.36x** | | --- ### 🎬 720P 5s Video Test **Test Configuration:** - **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) - **Parameters**: `infer_steps=40`, `seed=1234`, `enable_cfg=True` #### 📊 Performance Comparison Table | Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect | |:-------------|:-----------------:|:--------------:|:-------:|:------------:| | **Wan2GP(profile=3)** | -- | OOM | -- | | | **LightX2V_5** | 2473 | 23 | -- | | | **LightX2V_5-Distill** | 183 | 23 | -- | | | **LightX2V_6** | 2169 | 18 | -- | | | **LightX2V_6-Distill** | 171 | 18 | -- | | --- ## 📖 Configuration Descriptions ### 🖥️ H200 Environment Configuration Descriptions | Configuration | Technical Features | |:--------------|:------------------| | **Wan2.1 Official** | Based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1) original implementation | | **FastVideo** | Based on [FastVideo official repository](https://github.com/hao-ai-lab/FastVideo), using SageAttention2 backend optimization | | **LightX2V_1** | Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision | | **LightX2V_2** | Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality | | **LightX2V_3** | Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage | | **LightX2V_3-Distill** | Based on LightX2V_3 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality | | **LightX2V_4** | Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping | ### 🖥️ RTX 4090 Environment Configuration Descriptions | Configuration | Technical Features | |:--------------|:------------------| | **Wan2GP(profile=3)** | Implementation based on [Wan2GP repository](https://github.com/deepbeepmeep/Wan2GP), using MMGP optimization technology. Profile=3 configuration is suitable for RTX 3090/4090 environments with at least 32GB RAM and 24GB VRAM, adapting to limited memory resources by sacrificing VRAM. Uses quantized models: [480P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors) and [720P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors) | | **LightX2V_5** | Uses SageAttention2 to replace native attention mechanism, adopts DIT FP8+FP32 (partial sensitive layers) mixed precision computation, enables CPU offload technology, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity | | **LightX2V_5-Distill** | Based on LightX2V_5 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality | | **LightX2V_6** | Based on LightX2V_3 with CPU offload technology enabled, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity | | **LightX2V_6-Distill** | Based on LightX2V_6 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality | --- ## 📁 Configuration Files Reference Benchmark-related configuration files and execution scripts are available at: | Type | Link | Description | |:-----|:-----|:------------| | **Configuration Files** | [configs/bench](https://github.com/ModelTC/LightX2V/tree/main/configs/bench) | Contains JSON files with various optimization configurations | | **Execution Scripts** | [scripts/bench](https://github.com/ModelTC/LightX2V/tree/main/scripts/bench) | Contains benchmark execution scripts | --- > 💡 **Tip**: It is recommended to choose the appropriate optimization solution based on your hardware configuration to achieve the best performance.