Update benchmark

1292c53b · gushiqiao · 927ce73a · 1292c53b · 1292c53b · 1292c53b
Commit 1292c53b authored Jul 18, 2025 by gushiqiao
5 changed files
--- a/README.md
+++ b/README.md
@@ -49,7 +49,7 @@ For comprehensive usage instructions, please refer to our documentation: **[Engl
 ## 🚀 Core Features
 ### 🎯 **Ultimate Performance Optimization**
- **🔥 SOTA Inference Speed**: Achieve **~15x** acceleration via step distillation and system optimization (single GPU)
+- **🔥 SOTA Inference Speed**: Achieve **~20x** acceleration via step distillation and system optimization (single GPU)
 - **⚡️ Revolutionary 4-Step Distillation**: Compress original 40-50 step inference to just 4 steps without CFG requirements
 - **🛠️ Advanced Operator Support**: Integrated with cutting-edge operators including [Sage Attention](https://github.com/thu-ml/SageAttention), [Flash Attention](https://github.com/Dao-AILab/flash-attention), [Radial Attention](https://github.com/mit-han-lab/radial-attention), [q8-kernel](https://github.com/KONAKONA666/q8_kernels), [sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel), [vllm](https://github.com/vllm-project/vllm)

--- a/README_zh.md
+++ b/README_zh.md
@@ -47,7 +47,7 @@
 ## 🚀 核心特性
 ### 🎯 **极致性能优化**
- **🔥 SOTA推理速度**: 通过步数蒸馏和系统优化实现**15倍**极速加速(单GPU)
+- **🔥 SOTA推理速度**: 通过步数蒸馏和系统优化实现**20倍**极速加速(单GPU)
 - **⚡️ 革命性4步蒸馏**: 将原始40-50步推理压缩至仅需4步，且无需CFG配置
 - **🛠️ 先进算子支持**: 集成顶尖算子，包括[Sage Attention](https://github.com/thu-ml/SageAttention)、[Flash Attention](https://github.com/Dao-AILab/flash-attention)、[Radial Attention](https://github.com/mit-han-lab/radial-attention)、[q8-kernel](https://github.com/KONAKONA666/q8_kernels)、[sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)、[vllm](https://github.com/vllm-project/vllm)

--- a/docs/EN/source/getting_started/benchmark_source.md
+++ b/docs/EN/source/getting_started/benchmark_source.md
@@ -5,11 +5,11 @@
 ## H200 (~140GB VRAM)
 **Software Environment:**
- Python 3.11
+- **Python**: 3.11
- PyTorch 2.7.1+cu128
+- **PyTorch**: 2.7.1+cu128
- SageAttention 2.2.0
+- **SageAttention**: 2.2.0
- vLLM 0.9.2
+- **vLLM**: 0.9.2
- sgl-kernel 0.1.8
+- **sgl-kernel**: 0.1.8
 ### 480P 5s Video
@@ -19,14 +19,15 @@
 #### Performance Comparison
-| Configuration | Model Load Time(s) | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
+| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
-|:-------------|:------------------:|:-----------------:|:--------------:|:-------:|:------------:|
+|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
-| Wan2.1 Official(baseline) | 68.26 | 366.04 | 71 | 1.0x | <video src="PATH_TO_BASELINE_480P_VIDEO" width="200px"></video> |
+| **Wan2.1 Official** | 366 | 71 | 1.0x | <video src="https://github.com/user-attachments/assets/24fb112e-c868-4484-b7f0-d9542979c2c3" width="200px"></video> |
-| **LightX2V_1** | 37.28 | 249.54 | 53 | **1.47x** | <video src="PATH_TO_LIGHTX2V_1_480P_VIDEO" width="200px"></video> |
+| **FastVideo** | 292 | 26 | **1.25x** | <video src="" width="200px"></video> |
-| **LightX2V_2** | 37.24 | 216.16 | 50 | **1.69x** | <video src="PATH_TO_LIGHTX2V_2_480P_VIDEO" width="200px"></video> |
+| **LightX2V_1** | 250 | 53 | **1.46x** | <video src="https://github.com/user-attachments/assets/7bffe48f-e433-430b-91dc-ac745908ba3a" width="200px"></video> |
-| **LightX2V_3** | 23.62 | 190.73 | 35 | **1.92x** | <video src="PATH_TO_LIGHTX2V_3_480P_VIDEO" width="200px"></video> |
+| **LightX2V_2** | 216 | 50 | **1.70x** | <video src="https://github.com/user-attachments/assets/0a24ca47-c466-433e-8a53-96f259d19841" width="200px"></video> |
-| **LightX2V_4** | 23.62 | 107.19 | 35 | **3.41x** | <video src="PATH_TO_LIGHTX2V_4_480P_VIDEO" width="200px"></video> |
+| **LightX2V_3** | 191 | 35 | **1.92x** | <video src="https://github.com/user-attachments/assets/970c73d3-1d60-444e-b64d-9bf8af9b19f1" width="200px"></video> |
-| **LightX2V_4-Distill** | 23.62 | 107.19 | 35 | **3.41x** | <video src="PATH_TO_LIGHTX2V_4_DISTILL_480P_VIDEO" width="200px"></video> |
+| **LightX2V_3-Distill** | 14 | 35 | **🏆 20.85x** | <video src="" width="200px"></video> |
+| **LightX2V_4** | 107 | 35 | **3.41x** | <video src="https://github.com/user-attachments/assets/49cd2760-4be2-432c-bf4e-01af9a1303dd" width="200px"></video> |
 ### 720P 5s Video
@@ -34,7 +35,17 @@
 - **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
 - **Parameters**: infer_steps=40, seed=42, enable_cfg=True
-*Coming soon...*
+#### Performance Comparison
+| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
+|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
+| **Wan2.1 Official** | 974 | 81 | 1.0x | <video src="" width="200px"></video> |
+| **FastVideo** | 914 | 40 | **1.07x** | <video src="" width="200px"></video> |
+| **LightX2V_1** | 807 | 65 | **1.21x** | <video src="" width="200px"></video> |
+| **LightX2V_2** | 751 | 57 | **1.30x** | <video src="" width="200px"></video> |
+| **LightX2V_3** | 671 | 43 | **1.45x** | <video src="" width="200px"></video> |
+| **LightX2V_3-Distill** | 44 | 43 | **🏆 22.14x** | <video src="" width="200px"></video> |
+| **LightX2V_4** | 344 | 46 | **2.83x** | <video src="" width="200px"></video> |
 ---
@@ -50,11 +61,12 @@
 ---
-## Table Descriptions
+## Configuration Descriptions
- **Wan2.1 Official(baseline)**: Baseline implementation based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1)
+- **Wan2.1 Official**: Based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1)
- **LightX2V_1**: Uses SageAttention2 to replace native attention mechanism with DIT BF16+FP32 mixed precision (sensitive layers), improving computational efficiency while maintaining precision
+- **FastVideo**: Based on [FastVideo official repository](https://github.com/hao-ai-lab/FastVideo), using SageAttention backend
- **LightX2V_2**: Unified BF16 precision computation to further reduce memory usage and computational overhead while maintaining generation quality
+- **LightX2V_1**: Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision
- **LightX2V_3**: Quantization optimization introducing FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage
+- **LightX2V_2**: Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality
- **LightX2V_4**: Ultimate optimization adding TeaCache (teacache_thresh=0.2) caching reuse technology on top of LightX2V_3 to achieve maximum acceleration by intelligently skipping redundant computations
+- **LightX2V_3**: Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage
- **LightX2V_4-Distill**: Building on LightX2V_4 with 4-step distilled model ([Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v))
+- **LightX2V_3-Distill**: Based on LightX2V_3 using 4-step distillation model(`infer_step=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality.
+- **LightX2V_4**: Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping
--- a/docs/ZH_CN/source/getting_started/benchmark_source.md
+++ b/docs/ZH_CN/source/getting_started/benchmark_source.md
@@ -4,12 +4,12 @@
 ## H200 (~140GB显存)
-**软件环境配置:**
+**软件环境配置：**
- Python 3.11
+- **Python**: 3.11
- PyTorch 2.7.1+cu128
+- **PyTorch**: 2.7.1+cu128
- SageAttention 2.2.0
+- **SageAttention**: 2.2.0
- vLLM 0.9.2
+- **vLLM**: 0.9.2
- sgl-kernel 0.1.8
+- **sgl-kernel**: 0.1.8
 ### 480P 5s视频
@@ -19,23 +19,34 @@
 #### 性能对比
-| 配置 | 模型加载时间(s) | 推理时间(s) | GPU显存占用(GB) | 加速比 | 视频效果 |
+| 配置 | 推理时间(s) | GPU显存占用(GB) | 加速比 | 视频效果 |
-|:-----|:---------------:|:----------:|:---------------:|:------:|:--------:|
+|:-----|:----------:|:---------------:|:------:|:--------:|
-| **Wan2.1 Official** | 68.26 | 366.04 | 71 | 1.0x | <video src="https://github.com/user-attachments/assets/24fb112e-c868-4484-b7f0-d9542979c2c3" width="200px"></video> |
+| **Wan2.1 Official** | 366 | 71 | 1.0x | <video src="https://github.com/user-attachments/assets/24fb112e-c868-4484-b7f0-d9542979c2c3" width="200px"></video> |
-| **Fast Video** | xx | xx | xx | **xx** | <video src="" width="200px"></video> |
+| **FastVideo** | 292 | 26 | **1.25x** | <video src="" width="200px"></video> |
-| **LightX2V_1** | 37.28 | 249.54 | 53 | **1.47x** | <video src="https://github.com/user-attachments/assets/7bffe48f-e433-430b-91dc-ac745908ba3a" width="200px"></video> |
+| **LightX2V_1** | 250 | 53 | **1.46x** | <video src="https://github.com/user-attachments/assets/7bffe48f-e433-430b-91dc-ac745908ba3a" width="200px"></video> |
-| **LightX2V_2** | 37.24 | 216.16 | 50 | **1.69x** | <video src="https://github.com/user-attachments/assets/0a24ca47-c466-433e-8a53-96f259d19841" width="200px"></video> |
+| **LightX2V_2** | 216 | 50 | **1.70x** | <video src="https://github.com/user-attachments/assets/0a24ca47-c466-433e-8a53-96f259d19841" width="200px"></video> |
-| **LightX2V_3** | 23.62 | 190.73 | 35 | **1.92x** | <video src="https://github.com/user-attachments/assets/970c73d3-1d60-444e-b64d-9bf8af9b19f1" width="200px"></video> |
+| **LightX2V_3** | 191 | 35 | **1.92x** | <video src="https://github.com/user-attachments/assets/970c73d3-1d60-444e-b64d-9bf8af9b19f1" width="200px"></video> |
-| **LightX2V_4** | 23.62 | 107.19 | 35 | **3.41x** | <video src="https://github.com/user-attachments/assets/49cd2760-4be2-432c-bf4e-01af9a1303dd" width="200px"></video> |
+| **LightX2V_3-Distill** | 14 | 35 | **🏆 20.85x** | <video src="" width="200px"></video> |
-| **LightX2V_4-Distill** | xxx| xxx | xx | **xx** | <video src="" width="200px"></video> |
+| **LightX2V_4** | 107 | 35 | **3.41x** | <video src="https://github.com/user-attachments/assets/49cd2760-4be2-432c-bf4e-01af9a1303dd" width="200px"></video> |
 ### 720P 5s视频
 **测试配置:**
 - **模型**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
- **参数**: infer_steps=40, seed=42, enable_cfg=True
+- **参数**: infer_steps=40, seed=1234, enable_cfg=True
-*即将更新...*
+#### 性能对比
+| 配置 | 推理时间(s) | GPU显存占用(GB) | 加速比 | 视频效果 |
+|:-----|:----------:|:---------------:|:------:|:--------:|
+| **Wan2.1 Official** | 974 | 81 | 1.0x | <video src="" width="200px"></video> |
+| **FastVideo** | 914 | 40 | **1.07x** | <video src="" width="200px"></video> |
+| **LightX2V_1** | 807 | 65 | **1.21x** | <video src="" width="200px"></video> |
+| **LightX2V_2** | 751 | 57 | **1.30x** | <video src="" width="200px"></video> |
+| **LightX2V_3** | 671 | 43 | **1.45x** | <video src="" width="200px"></video> |
+| **LightX2V_3-Distill** | 44 | 43 | **🏆 22.14x** | <video src="" width="200px"></video> |
+| **LightX2V_4** | 344 | 46 | **2.83x** | <video src="" width="200px"></video> |
 ---
@@ -51,12 +62,12 @@
 ---
-## 表格说明
+## 配置说明
 - **Wan2.1 Official**: 基于[Wan2.1官方仓库](https://github.com/Wan-Video/Wan2.1)
- **FastVideo**: 基于[FastVideo官方仓库](https://github.com/hao-ai-lab/FastVideo)
+- **FastVideo**: 基于[FastVideo官方仓库](https://github.com/hao-ai-lab/FastVideo)，使用SageAttention后端
 - **LightX2V_1**: 使用SageAttention2替换原生注意力机制，采用DIT BF16+FP32(部分敏感层)混合精度计算，在保持精度的同时提升计算效率
 - **LightX2V_2**: 统一使用BF16精度计算，进一步减少显存占用和计算开销，同时保持生成质量
 - **LightX2V_3**: 引入FP8量化技术显著减少计算精度要求，结合Tiling VAE技术优化显存使用
+- **LightX2V_3-Distill**: 在LightX2V_3基础上使用4步蒸馏模型(`infer_step=4`, `enable_cfg=False`)，进一步减少推理步数并保持生成质量。
 - **LightX2V_4**: 在LightX2V_3基础上加入TeaCache(teacache_thresh=0.2)缓存复用技术，通过智能跳过冗余计算实现加速
- **LightX2V_4-Distill**: 在LightX2V_4基础上使用4步蒸馏模型([Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v))
--- a/lightx2v/models/networks/wan/model.py
+++ b/lightx2v/models/networks/wan/model.py
@@ -96,7 +96,7 @@ class WanModel:
        return weight_dict
    def _load_quant_ckpt(self, use_bf16, skip_bf16):
-        ckpt_path = self.config.dit_quantized_ckpt
+        ckpt_path = self.dit_quantized_ckpt
        logger.info(f"Loading quant dit model from {ckpt_path}")
        index_files = [f for f in os.listdir(ckpt_path) if f.endswith(".index.json")]
@@ -126,7 +126,7 @@ class WanModel:
        return weight_dict
    def _load_quant_split_ckpt(self, use_bf16, skip_bf16):
-        lazy_load_model_path = self.config.dit_quantized_ckpt
+        lazy_load_model_path = self.dit_quantized_ckpt
        logger.info(f"Loading splited quant model from {lazy_load_model_path}")
        pre_post_weight_dict = {}